Search Results

Search found 4479 results on 180 pages for 'pdf scraping'.

Page 58/180 | < Previous Page | 54 55 56 57 58 59 60 61 62 63 64 65 | Next Page >

How do I screen scrape a website and get data within div?

- by user272899

How can I screen scrape a website using cURL and show the data within a specific div?

Read the article
How do I print the Images?

- by user1477539

I want to print the images of the 30 nba teams drafting in the first round. However when I tell it to print it prints out the link instead of the image. How do I get it to print out the image instead of giving me the image link. Here's my code: import urllib2 from BeautifulSoup import BeautifulSoup # or if your're using BeautifulSoup4: # from bs4 import BeautifulSoup soup = BeautifulSoup(urllib2.urlopen('http://www.cbssports.com/nba/draft/mock-draft').read()) rows = soup.findAll("table", attrs = {'class': 'data borderTop'})[0].tbody.findAll("tr")[2:] for row in rows: fields = row.findAll("td") if len(fields) >= 3: anchor = row.findAll("td")[1].find("a") if anchor: print anchor

Read the article
selenium, get text from id

- by user3766148

on the following url - http://www.filestube.to/26frq-Buffalo-Clover-Test-Your-Love-2014-9Jai9TJFukAS9fq9sWngAD.html I am trying to copy the; Direct links: turbobit.net/9mrb0eu9eksx/26frq.Buffalo.Clover..Test.Your.Love.2014.rar.html via css path or xpath and unable to retrieve the information and store it to a variable. firebug gives me html body div.cnt div.rH.no-js.fd div.rl div.fgBx pre span#copy_paste_links but when I apply css=html.body.div.cnt.div.rH.no-js.fd.div.rl.div.fgBx.pre.span#copy_paste_links/text() to the target, I get error not found http://i.imgur.com/KdBmDHE.png

Read the article
Web scrapping from a Google Chrome extension

- by limoragni

I've started to develop a Chrome extension to navigate and prform actions on a website. Until now the extension is able to receive a couple of parameters and check a set of radio-buttons, fill in a few inputs of a form and then submit it. What I want to do now is to repeat the process, but I'm stuck when the page is reloaded. And I don't know how can I do to make the script reacts to the finish of the request. The workflow I want to achieve is the following (is for automaticly copying a certain object): Popup side Enter the number of the Master object to copy Enter the base name of the copies (example Mod, so the I can iterate and add mod1, mod2, modn) Enter the number of copies Background side Select master Select standard options Fill in inputs Submit form Wait for the page to complete the request and continue to the next copy. (here I need help) The problem is on the repetition, the rest is taking care of. I assume that must be a way of dealing with requests. Any ideas? By the way I'm doing it all with the extension and tabs methods of google chrome plus javascript and jquery.

Read the article
Asp.Net Scrapping Grid Pages

- by SH

I need cod in C#. Look, i am trying to post the search.aspx page which contains Asp.Net grid. When grid is rendered it loads very first page on the screen and then there are number of pages in the grid header. I scrap first page, and now i want to move on to the next page. All this is being done using following code: HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create("http://pubrec3.hillsclerk.com/oncore/search.aspx?" + param); myRequest.Method = "GET"; myRequest.KeepAlive = false; HttpWebResponse webresponse; try { webresponse = (HttpWebResponse)myRequest.GetResponse(); Encoding enc = System.Text.Encoding.GetEncoding(1252); StreamReader loResponseStream = new StreamReader(webresponse.GetResponseStream(), enc); string r = loResponseStream.ReadToEnd(); loResponseStream.Close(); webresponse.Close(); //if (GetRecordCount(r)) ExtractResultTable(bd, ed, r); } catch (Exception ex) { } The above code grabs first page... and now i have to move on to the next page. This is the link which produces a grid with 3 pages. http://pubrec3.hillsclerk.com/oncore/search.aspx?bd=01/01/2008&ed=12/31/2008&bt=O&lb=1000000&ub=1000000000&d=5/6/2010&pt=-1&dt=D,%20MTG&st=consideration Using above code i need to load the 2nd page with the same search criteria but the records found in 2nd page. and then so on. I know there is a trick to navigate through the grid pages. I used it but it did not work on this page. the trick is, you can pass __EVENTTARGET and __EVENTARGUMENT in query string to navigate through the gird but it does not work on this website. I am desperately searching a way, how to cope with this website. i really want this done. i do not want any code but a way to nevigate throgh the grid using query string if it does work. Otherwise please be specific to the problem.

Read the article
change label text from a VB6 binary (not source code)

- by Jun

Hi, we have a VB6 binary executable that comes with no source code. And we need to change the label text for that VB6 application from "AAA" to "BBB". Is there any way or tools that can do that? The closest tool I can find right now is microsoft UISpy, it can read all the other elements but not the label. I hope there is a tool that can change the resource in the .exe so that the label "AAA" will read "BBB". Or is it possible to write a wrapper application, it will launch the .exe, examine the application screen for "AAA" and change that to "BBB"? Thank you for your help!

Read the article
How to script a URL screenshot without X?

- by cwopen

I'd like to automate a 'screenshot' of arbitary URL's using a Linux build that doesn't have X installed. There appears to be some (costly) web services to do this, but I specifically want something I can do locally. Tried imagemagick without much success, though Mozilla used to have a command line option to do it?

Read the article
Extract Address Information from a Web Page

- by Brian Boatright

I need to take a web page and extract the address information from the page. Some are easier than others. I'm looking for a firefox plugin, windows app, or VB.NET code that will help me get this done. Ideally I would like to have a web page on our admin (ASP.NET/VB.NET) where you enter a URL and it scraps the page and returns a Dataset that I can put in a Grid.

Read the article
How to know if the website being scraped has changed?

- by Lost_in_code

I'm using PHP to scrape a website and collect some data. It's all done without using regex. I'm using php's explode() method to find particular HTML tags instead. It is possible that if the structure of the website changes (CSS, HTML), then wrong data may be collected by the scraper. So the question is - how do I know if the HTML structure has changed? How to identify this before storing any data to my database to avoid wrong data being stored.

Read the article
Parse livescores from web site

- by Venno

Hi all, I was thinking of parsing live scores from a web site via PHP and them use them for an application I am planning to implement, so my question is is it legal to do that, parse info from web site and use it ? If I quote the source if the info ?

Read the article
Python lxml - returns null list

- by Chris Finlayson

I cannot figure out what is wrong with the XPATH when trying to extract a value from a webpage table. The method seems correct as I can extract the page title and other attributes, but I cannot extract the third value, it always returns an empty list? from lxml import html import requests test_url = 'SC312226' page = ('https://www.opencompany.co.uk/company/'+test_url) print 'Now searching URL: '+page data = requests.get(page) tree = html.fromstring(data.text) print tree.xpath('//title/text()') # Get page title print tree.xpath('//a/@href') # Get href attribute of all links print tree.xpath('//*[@id="financial"]/table/tbody/tr/td[1]/table/tbody/tr[2]/td[1]/div[2]/text()') Unless i'm missing something, it would appear the XPATH is correct: Chrome screenshot I checked Chrome console, appears ok! So i'm at a loss $x ('//*[@id="financial"]/table/tbody/tr/td[1]/table/tbody/tr[2]/td[1]/div[2]/text()') [ "£432,272" ]

Read the article
Possible to use Javascript to access the client side's network(knowingly)

- by Earlz

I recently found an exploit in my router to basically give me root access. The catch? There is a nonce hidden form value that is randomly generated and must be sent in for it to work that makes it difficult to do "easily" So basically I'm wanting to do something like this in javascript: get http://192.168.1.254/blah use a regex or similar to extract the nonce value put the nonce value into a hidden field in the current page submit the form by POST to http://192.168.1.254/blah complete with the nonce value and other form values I want to send in. Is this at all possible using only HTML and Javascript? I'm open to things like "must save HTML file locally and then open", which I'm thinking is one way around the cross domain policy. But anyway, is this at all possible? I'm hoping for this to be able to run from at least Firefox and Chrome. The audience for this is those with some technical know how.

Read the article
How to extract images from flash viewers?

- by RC

This deals with the (diverse) flash viewers that let you zoom in on images on websites. I’m trying to extract the large, zoomed-in image rendered by the viewer. In many cases the images seem to be dynamically called by the viewer, or are created only for the part of the image you are zooming on at that point. Ideally, the approach here would be a programmatic one that could be called on an identified flash element. Expect there is nothing universal, but interested in the top few approaches that will grab most cases.

Read the article
How to "scan" a website (or page) for info, and bring it into my program?

- by James

Well, I'm pretty much trying to figure out how to pull information from a webpage, and bring it into my program (in Java). For example, if I know the exact page I want info from, for the sake of simplicity a Best Buy item page, how would I get the appropriate info I need off of that page? Like the title, price, description? What would this process even be called? I have no idea were to even begin researching this.

Read the article
Annotate pdfs in Firefox on mac-os

- by Space_C0wb0y

I have several pdfs stored locally. I have file:/// links to these pdfs in my local TiddlyWiki. When I open one of these, Firefox opens it inline, as expected. Now I want to add annotations to these pdfs as I read them. Since I have not found a way to do this when viewing them inline, I used the open in Preview feature in the context menu. This works fine, but when I want to save, Preview complains that the document is locked. It appears Firefox creates a temporary copy that it gives to preview to open, instead of the real thing. Is there any way to work around this? I want to either be able to save the annotated files from preview or to do the annotations directly in Firefox. I am using Snow-Leopard with Firefox 3.6. Edit I can annotate the pdf just fine when I open them in preview directly.

Read the article
Adobe Reader XI and Acrobat X will not open

- by irule311

Just a few days ago, Adobe Reader XI and Acrobat X stopped opening. When I try to open a pdf or either one of these programs the cursor would show a busy icon for a few seconds and stop. Each time I open it a process will show up in task manager, but no window will open. I have tried these things, but none will work Restarting the computer Running as Administrator Repairing the installation Reinstalling Following this guide: http://helpx.adobe.com/creative-suite/kb/acrobat-failed-launch-30-days.html When I ran acrofix it returned exit code 3 Also I have seen this, but I am not running tune up: Adobe Reader XI will not launch I am running windows 7 ultimate x64. Can someone help? Thanks!

Read the article
How can I print a batch of files with custom printer settings?

- by Li-aung Yip

I have a collection of A3-size (tabloid size) PDF drawings which I would like to print as a batch. The particular printer I am using doesn't automatically check the paper size of the document - it always defaults to A4 (letter) size paper unless told otherwise. I need to go into the printer settings and tell the printer to use A3 size paper. A batch printing utility like DarkStorm's Batch Print Handler would be good for this. However this particular utility doesn't support changing the paper size settings or other printer settings. How can I go about batch printing these PDFs with a particular paper size?

Read the article
Dialog box in Adobe reader is scrambled

- by Rutred

I use win 7 and Adobe reader 11. After updating from ADobe reader 9 to 10 (automatically) I can´t see a proper dialog boxx when i print the PDF. EIther it har veryyyyy big letters or the text is missing in the Printing option box. I have tried to uninstall and reinstall several times, but there is no difference. I have also tried to download the sherife sans font. When I hade Adoobe reader 8, there was no problem. On an other computer I had the same problem, but I could resolve it by uninstalling and reinstallation Adobe reader. Rutred

Read the article
How can I deskew and crop PDFs made from scanned pages *automatically*? [closed]

- by Pietro M.

I have several PDFs made up of book pages' scans. The scans are made from two pages at a time and some of these scans are skewed, making text appear slightly tilted. I'm looking for a tool that could allow me to do an automatic optimization by deskewing the scans without losing readability. I've found the GPL software briss to crop the scans in order to have a 1:1 page ratio instead of 2:1, but I don't have any tool to deskew the pages. I stumbled upon unpaper, another open source tool that seems perfect for what I want to do, but that tool is Linux only and it doesn't work on PDF files directly. Any hint is appreciated.

Read the article
Looking for a text editor with navigation/categorization

- by RadGH

I've been looking for a text editor that automatically (or at least makes it easy to-) make some sort of navigation. Adobe Reader has this functionality with its bookmark system: Right now, though, I'm using Word 2007. For each section, I go Insert Bookmark, highlight the text, copy/paste the text as the link information, and it appears at the top of the document. I've made a macro to add bookmarks easier, but it's still pretty awful, and the bookmarks are still at the top of the page (rather than in the sidebar, where it's always accessible) Honestly, I would just prefer to write it in a PDF like in that screenshot. But any text editor with this type of functionality would work. It just needs basic formatting options, bold/font size, underline, images, maybe tables.

Read the article
OWA (Outlook Web Access) won't give user a link to attached document. Gives "Open as Web Page"

- by Jerry Mayers

I have been testing OWA and have found that when there is an attached file on a message it shows the "Attachments:" then the file name (but no link) and then [Open as Web Page] link. The user has to click on that which opens a new window and previews the document, then the user gets "You are currently viewing:" followed by the file name. This time the file name is a link that they can open/download. This happens for Office documents and PDF file at least. How do I make it so users can get a direct link to the file on the original message window?

Read the article
cleaning up pdftotext font issues

- by mankoff

I'm using pdftotext to make an ASCII version of a PDF document (made with LaTeX), because collaborators prefer a simple document in MS word. The plain text version I see looks good, but upon closer inspection the f character seems to be frequently mis-converted depending on what characters follow. For example, fi and fl often seem to become one special character, which I will try to paste here: ? and ?. What is the best way to clean up the output of pdftotext? I am thinking sed might be the right tool, but am not sure how to detect these special characters.

Read the article
Extract first page from multiple pdfs

- by Tim Alexander

Have got about 500 PDFs to go through and extract the first page of. They then need to go through some time consuming conversion process so was hoping to try and save some time by have a batch process to extract just the first page from the 500 pdfs and place it in a new pdf. Have had a poke around Acrobat but can find no real method of doing this for multiple files. Does anyone know any other programs or methods that this could be achieved? Free and open source are obviously more favourable :) EDIT: Have actually had some success using GhostScript to extract just one page. Am now looking at how to batch that and take the list of files and use those.

Read the article
HP DV5T laptop problems using Adobe in multiple monitors

- by Rolnik

I have a dualview arrangement of two monitors that works just fine for 90% of my applications. However, any time I open Adobe Acrobat Reader to the larger monitor, it fails to permit enlarging of the window to fill the entire screen. In other words, it seems Adobe crops the PDF view at about the size of the smaller of the two monitors, despite Acrobat being present in the larger monitor. I'm using nVideo graphics on the laptop, and connecting to the larger monitor through the docking port of a docking station. Any ideas how to get the Acrobat to be aware that it has more screen real-estate and use it fully?

Read the article
How can I remove unwanted cropped pages from Acrobat

- by Servant

Executing the crop command in Acrobat from a 3000pt * 2000pt document to 1500pt*1800pt only hides the document outside of the new boundaries but the original document still remains without change; if anyone uses the touch-up tool and moves the content, all "hidden" information outside the cropped page may appear again by dragging it into view. The page acting as a window (or a mask) to display the 3000pt * 2000pt. I am wondering if there is a solution to crop permanently the document without reprinting it again into PDF file? Please find pictures attached: http://i.stack.imgur.com/5JTPg.png http://i.stack.imgur.com/HPokv.png

Read the article

< Previous Page | 54 55 56 57 58 59 60 61 62 63 64 65 | Next Page >