Search Results

Search found 9935 results on 398 pages for 'pages'.

Page 15/398 | < Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22  | Next Page >

  • Screen scraping software that will traverse pages

    - by nilbus
    We're creating a mashup site that pulls information from many sources all over the web. Many of these sites don't provide RSS feeds or APIs to access the information they provide. This leaves us with screen scraping as our method for collecting the data. There are many scripting tools out there written in different scripting languages for screen scraping that require you to write scraping scripts in the language the scraper was written in. Scrapy, scrAPI, and scrubyt are a few written in Ruby and Python. There are other web-based tools I've seen like Dapper that create XML or RSS feeds based on a webpage. It has a beautiful web-based interface that requires no scripting skills to use. This would be a great tool, if it were able to traverse multiple pages to gather data from hundreds pages of results. We need something that will scrape information from paginated web sites, much like scrubyt, but with a user interface that a non-programmer could use. We'll script up our own solution if we need to, probably using scrubyt, but if there's a better solution out there, we want to use it. Does anything like this exist?

    Read the article

  • Looking for software to read PDFs/web pages aloud on OS X

    - by Clinton Blackmore
    I am looking for software that will read PDFs and web pages aloud for me under OS X 10.5, preferably something that is free. I am aware that you can make your Mac read to you by pressing a key combination. It is pretty slick, but I really want something that: will allow me to say, "Read this document" and let me skip paragraphs and pause (instead of simply stopping and then restarting from the beginning) will allow me to skip things that aren't relevant, like page headers, footers, and side bars. will allow me to rewind and listen to something again (either to think on it more deeply, or to understand what the text-to-speech engine was trying to say) for a pdf with text in two columns, will let me read just one column at a time. (Right now if I make a selection, it gets both columns and reads from one and then from another. If I could just select one column and read it, I'd be happier. [IIRC, Apple improved things in Snow Leopard so you can select one column in a pdf.]) I don't really expect one program to do both pdfs and web pages, but it would be nice.

    Read the article

  • Pages load in brower fine, but 404 not found reported for the page during the GET on all pages except index

    - by user885983
    I believe this question is more suited to serverfault (please correct me if not). This issue appears very similar to this question (except there are no 301 Moved Permanently for any pages). The domain is yorkshirebadges.co.uk. For example, loading yorkshirebadges.co.uk or yorkshirebadges.co.uk/index.php reports no 404s during network inspection. But every other page (/contact.php, /products.php) report a not found. Mod_rewrite is being used on the site, I checked this out but didn't see any obvious errors. It's included below for reference: RewriteEngine on RewriteRule ^store/material/([^/\.]+)/price/?([^/\.]+)?$ products.php?prodType=$1&price=$2 RewriteRule ^store/price/?([^/\.]+)?$ products.php?price=$1; RewriteRule ^store/material/?([^/\.]+)?$ products.php?prodType=$1 RewriteRule ^store/([^/\.]+)/?$ products.php?prodCat=$1 RewriteRule ^store/([^/\.]+)/price/([^/\.]+)$ products.php?prodCat=$1&price=$2 RewriteRule ^store/Type/?([^/\.]+) products.php?prodType=$1 RewriteRule ^store/([^/\.]+)/?([^/\.]+)?$ view-product-details.php?cat=$1&prodName=$2 RewriteRule ^store/([^/\.]+)/material/?([^/\.]+)?$ products.php?prodCat=$1&prodType=$2 RewriteRule analytics http://www.google.com/analytics <IfModule mod_suphp.c> suPHP_ConfigPath /home/yorkshir <Files php.ini> order allow,deny deny from all </Files> </IfModule> Chrome Network Inspection (and firebug on firefox) report 404s on all pages except the index, the server is apache2. Really scratching my head on this one!

    Read the article

  • MS Word TOC that references # pages rather than page number

    - by buttonsrtoys
    We frequently need to write specifications in Word which require a TOC that refers to the total number of pages in a section, rather than the page number. E.g., Section No. Pages 01010 Summary of Work..............5 01025 Prices.......................2 01400 Quality Control..............1 01700 Contract Close Out...........2 A wrinkle is that each section is a separate file. To date, we've been writing or TOC by hand, which has introduced every error imaginable. Is there an MS feature that populates a TOC with page totals? If not, I've done a little VB in Office, so wouldn't be opposed to that route as need be, as long as it was usable by our low tech users. Related question - all the section files are in the same folder. It would be nice if the TOC loaded every file in a folder, rather than having to specify each one. Is this a feature of Word or would this require VB? We tried a master document with links to subdocuments, but since the number of section files ebbs and flows with each project, the approach required too much maintenance for our Wordophobes.

    Read the article

  • Some pages begin to load and stop on Chrome

    - by corsiKa
    I'm using Chrome (Version 22.0.1229.94). About 20% of pages simply stop loading after a second or two. Sometimes, I'm able to click an early link after a second or third reload. If I attempt to close the page, it continues to hang for a few (10-30) additional seconds, then closes. However, if I switch to other tabs, it works just fine. If I don't change tabs and wait long enough, it says that there's something on the page that's taking too long to run and offers to let me kill it. Only a select number of sites fail to load, and they do so consistently. None of the stackexchange sites or google fail, but others like realclearpolitics and wowwiki do. I visit those sites every day, and this is the first time it has failed like this. If it were just one site, I would say someone messed something up in their deployment. But it seems incredibly peculiar that suddenly, half a dozen popular sites all mysteriously have the same symptoms. If I attempt to load the pages in Firefox or IE9, they load just fine. Nothing new has been installed, regarding Chrome or otherwise. Antivirus reports no abnormalities. System is regularly patched. Restarting both chrome and the computer have had no effect.

    Read the article

  • Why do Google search results include pages disallowed in robots.txt?

    - by Ilmari Karonen
    I have some pages on my site that I want to keep search engines away from, so I disallowed them in my robots.txt file like this: User-Agent: * Disallow: /email Yet I recently noticed that Google still sometimes returns links to those pages in their search results. Why does this happen, and how can I stop it? Background: Several years ago, I made a simple web site for a club a relative of mine was involved in. They wanted to have e-mail links on their pages, so, to try and keep those e-mail addresses from ending up on too many spam lists, instead of using direct mailto: links I made those links point to a simple redirector / address harvester trap script running on my own site. This script would return either a 301 redirect to the actual mailto: URL, or, if it detected a suspicious access pattern, a page containing lots of random fake e-mail addresses and links to more such pages. To keep legitimate search bots away from the trap, I set up the robots.txt rule shown above, disallowing the entire space of both legit redirector links and trap pages. Just recently, however, one of the people in the club searched Google for their own name and was quite surprised when one of the results on the first page was a link to the redirector script, with a title consisting of their e-mail address followed by my name. Of course, they immediately e-mailed me and wanted to know how to get their address out of Google's index. I was quite surprised too, since I had no idea that Google would index such URLs at all, seemingly in violation of my robots.txt rule. I did manage to submit a removal request to Google, and it seems to have worked, but I'd like to know why and how Google is circumventing my robots.txt like that and how to make sure that none of the disallowed pages will show up in their search results. Ps. I actually found out a possible explanation and solution, which I'll post below, while preparing this question, but I thought I'd ask it anyway in case someone else might have the same problem. Please do feel free to post your own answers. I'd also be interested in knowing if other search engines do this too, and whether the same solutions work for them also.

    Read the article

  • How do I print unprintable web pages?

    - by user1413
    I want to print out a web page that seems to be unprintable in both Firefox and Chrome. It is multiple pages but when I print it out in Firefox and Chrome, they only print the first page. The only way I have found to print out the page is to print it in IE in XPS Document Writer format. Is there a tool (e.g., web browser or web browser plugin) that will help? Or is there a setting I can use in Firefox or Chrome?

    Read the article

  • WSS and CAG , _layout pages break

    - by Mike
    Alright, I've searched everywhere and I cannot find the answer, due to the rarity of our setup. WSS 3.0/IIS 6.0/WinServer 2003 We have a sharepoint site that is in good shape, almost. Its TCP and SSL port are uncommon and need to be rerouted to work properly. This is where the Citrix Access Gateway (CAG) comes in play. It will redirect any request from URL (something.something.com) to the correct SSL port on the correct server. My AAM is configured to Default something.something.com and nothing else, since the CAG will provide the port. We use FBA, and require SSL. This works perfectly for everything that is safe or that is anything that an end user can see, but if I try to add a webpart, it errors out. Whereas if I add it internally, or bypass the CAG the webpart adds fine. The same goes for most of the _layouts pages, like _layouts/new.aspx. If I add a Link List/Doc library on the something.something.com, it errors out (Page cannot be displayed) and the page won't display, but if I try it with an internal address it will work fine. I found that if I am trying to add something or doing anything administrative, the site will navigate to the pages that I need to go to fine, but when i actually ADD something the URL will change from something.something.com to something.something.com:SSLport, thus erroring out the site. The URL with the SSL port shows on the Site URL when navigating to Site Settings. However, if I bypass the CAG, using the internal address the _layouts page works like a charm and i can add anything. All the CAG does is reroute a DNS request to the provided server and port. I've tried reextending the application, no luck same thing. I've tried changing the AAM to hide the port and the CAG rejects it. I've tried to recreate a new webapp/site collection with the same rules on the CAG, same thing occurs. Correct me if I'm wrong, and please provide me with some feedback and answers. Any suggestions would be very appreciated. Is it the CAG or the Alternate Access Mappings (AAM)?

    Read the article

  • How to record a "macro" that saves web pages as PDF in OSX

    - by dwatson
    I frequently save pages as PDF from Chrome on OSX. The page is apple-P then click the PDF button and the "Save as PDF.." menu item. I always use the pre-filled filename and save in the default directory. Is it possible to save this as an automator script? If that is possible I woulld sure like to add this as a button on Chrome somewhere so I can just "save this for reading later" Thanks for any help.

    Read the article

  • How to record a "macro" that saves web pages as PDF in OSX

    - by dwatson
    I frequently save pages as PDF from Chrome on OSX. The page is apple-P then click the PDF button and the "Save as PDF.." menu item. I always use the pre-filled filename and save in the default directory. Is it possible to save this as an automator script? If that is possible I woulld sure like to add this as a button on Chrome somewhere so I can just "save this for reading later" Thanks for any help.

    Read the article

  • Imagick Convert Append 2 pdf pages

    - by Hammad Khalid
    I am using the following code to make a single pdf file with multiple pages in one jpg file I am using Imagick library and PHP tcpdf convert -append path1.pdf path2.jpg Now what i need to do is to add a white space between each page to differentiate them from one another, or add text in between like Page 1, Page 2. Currently they come correct. But there is no space in between. Can anyone help me out

    Read the article

  • direct http to https on certain pages?

    - by Elliott
    Hi below is some code I added to my .htaccess code how can I add certain pages to be re-directed to https? such as login.php & login.html also if the user types in www. they get a "untrusted connection" as the SSL is only valid without the www. how could I fix this? Thanks RewriteEngine On RewriteCond %{HTTPS} off RewriteCond %{REQUEST_URI} /login.html RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI}

    Read the article

  • How to select all text inside text area of code field on web pages

    - by Moorage
    In all the forums web pages the download links are given inside the [code] braces that looks like text area with white background. When the download links are in large numbers then there are scroll bars. I find it difficult to scroll up and then select and then drag down to select all links. Is there any way to select all , when I click on select all on right click then it usually selects the text from whole page not from that code segment

    Read the article

  • How do you switch between Linux manual pages?

    - by Sheldon
    I'm new with Linux and have noticed that there are numbers beside certain commands I look up. For example I want to look up accept() in the aspect of network programming, but man accept shows this instead: accept(8) Easy Software Products accept(8) NAME accept/reject - accept/reject jobs sent to a destination So how do you switch between manual pages to other numbers like accept(1) ~ accept(7)?

    Read the article

  • hp officejet 5510 is only printing blank pages if used with Ubuntu 9.10

    - by mutzel
    I'm trying to setup a hp officejet 5510 using HPLIP 3.10.2 under Ubuntu 9.10. The installation of the driver according to this guide was no problem but after installing and selecting the printer I was only able to print blank pages. The printer is working well under windows and scanning (its a multi-functional printer) is also possible under Ubuntu. Does anyone know this problem and a possible solution?

    Read the article

  • Multiple pages per sheet (PDF)

    - by smihael
    I use the following commands pdf2ps input.pdf - | psnup -pA4 -4 >> output.ps ps2pdf output.ps output.pdf rm output.ps to merge multiple pages (in this case 4) from input file to one sheet in outupt file. How can I modify pipelining so that I won't have to use 2 commands, but just a single one liner? Is there any other commandline tool that would do the same and can work directly on pdf files?

    Read the article

  • Some web pages won't download fully

    - by Sumac
    Some web pages won't download fully under any browser on any computer connected to the network. I have Internet access through a wireless modem/router (2 Mbps DSL connection, wireless reception is excellent). I use Opera and when I turn on Opera turbo the same sites download fully. I tried changing to some other dns (opendns, google dns), but it made no difference. What would you suggest I try? OS : Windows 7 64 bit

    Read the article

  • Child web.config can't clear <pages><controls> from parent web.config

    - by Lance Rushing
    How can I "clear" the vendor defined <controls> in my child app's web.config? Parent Web Config. <system.web> <pages> <controls> <!-- START: Vendor Custom Control --> <add tagPrefix="asp" namespace="VENDOR.Web.UI.Base" assembly="System.Web.Extensions, Version=1.0.61025.0, Culture=neutral /> ... <!-- END: Vendor Custom Control --> ... </controls> <tagMapping> <add tagType="System.Web.UI.WebControls.WebParts.WebPartManager" mappedTagType="Microsoft.Web.Preview.UI.Controls.WebParts.WebPartManager" /> <add tagType="System.Web.UI.WebControls.WebParts.WebPartZone" mappedTagType="Microsoft.Web.Preview.UI.Controls.WebParts.WebPartZone" /> </tagMapping> </pages> </system.web> Child: <system.web> <pages> <controls> <add tagPrefix="asp" namespace="System.Web.UI" assembly="System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/> <add tagPrefix="asp" namespace="System.Web.UI.WebControls" assembly="System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/> </controls> <tagMapping> <clear/> </tagMapping> </pages> </system.web> I have it working for the <tagMapping> section, but <controls> does not support <clear/> (or ).

    Read the article

  • session variables lost between pages or use same variables

    - by user222333
    Hi, Yesterday I learn many thing from you, especially from Marc and my problem was solved ( session variables lost between pages or use same variables ). But now I continue asking: I don't want to use Session ID(session.use_trans_sid = 1) between pages. But also I don't want to use same session variables for different users at same application and also I don't want lost session variables between pages for same user. Is it possible? If yes how? Thanks for everybody for any help. Best regards. I have Wamp Server(2.2.11) with PHP(5.2.9.-2). My php.ini's session settings at below: [Session] session.save_handler = files session.save_path = "c:/wamp/tmp" session.use_cookies = 0 ;session.cookie_secure = ;session.name = PHPSESSID session.auto_start = 0 session.cookie_lifetime = 0 session.cookie_path = / session.cookie_domain = session.cookie_httponly = session.serialize_handler = php session.gc_probability = 1 session.gc_divisor = 1000 session.gc_maxlifetime = 1440 session.bug_compat_42 = 0 session.bug_compat_warn = 1 session.referer_check = session.entropy_length = 0 session.entropy_file = ;session.entropy_length = 16 ;session.entropy_file = /dev/urandom session.cache_limiter = nocache session.cache_expire = 180 session.use_trans_sid = 1 session.hash_function = 0 session.hash_bits_per_character = 5 url_rewriter.tags = "a=href,area=href,frame=src,input=src,form=fakeentry"

    Read the article

< Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22  | Next Page >