Search Results

Search found 4479 results on 180 pages for 'pdf scraping'.

Page 57/180 | < Previous Page | 53 54 55 56 57 58 59 60 61 62 63 64 | Next Page >

How do I send an arrow key in Perl using the Net::Telnet module?

- by pokstad

Using the Perl module Net::Telnet, how do you send an arrow key to a telnet session so that it would be the same thing as a user pressing the down key on the keyboard? use Net::Telnet; my $t = new Net::Telnet(); my $down_key=?; #How do you send a down key in a telnet session? t->print($down_key);

Read the article
How to handle redirects while parsing HTML? - Python

- by RadiantHex

Hi folks, I'm trying to submit a few forms through a Python script, I'm using the mechanized library. This is so I can implement a temporary API. The problem is that before after submission a blank page is returned informing that the request is being processed, after a few seconds the page is redirected to the final page. I understand if it might sound a bit generic, but I'm not sure what is going on. :) Any ideas?

Read the article
How can I use Perl to grab text from a web page that is dynamically generated with JavaScript?

- by bstullkid

There is a website I am trying to pull information from in Perl, however the section of the page I need is being generated using javascript so all you see in the source is: <div id="results"></div> I need to somehow pull out the contents of that div and save it to a file using Perl/proxies/whatever. e.g. the information I want to save would be document.getElementById('results').innerHTML; I am not sure if this is possible or if anyone had any ideas or a way to do this. I was using a lynx source dump for other pages but since I cant straight forward screen scrape this page I came here to ask about it! If anyone is interested, the page is http://downloadcenter.trendmicro.com/index.php?clk=left_nav&clkval=pattern_file&regs=NABU and the info I am trying to get is the row about the ConsumerOPR

Read the article
Nokogiri find only inbound links

- by astropanic

I have an html document located on http://somedomain.com/somedir/example.html The document contains of four links: http://otherdomain.com/other.html http://somedomain.com/other.html /only.html test.html How I can get the full urls for the links in the current domain ? I mean I should get: http://somedomain.com/other.html http://somedomain.com/only.html http://somedomain.com/somedir/test.html The first link should be ignored because it does'nt match my domain

Read the article
How selectorgadget works?

- by andrisetiawan

How selectorgadget.com works? Is there any link/page that explain the algorithm behind selectorgadget? thanks

Read the article
where can i get large number of proxy ip's ?

- by wefwgeweg

i need a a long list of working proxy ip's to get around ip banning. where can i find it ?

Read the article
Programmatically login to a website and redirect the user to the logged in page?

- by Santhosh

Hi, Right now, I have all the employees of my company login to an external website using the company id, username and a password. We are trying to integrate it into an intranet portal which should provide seamless access to this website without requiring the user to enter these credentials. Is there any way of doing this programmatically (.NET C#)? Very similar to screenscraping, Can I simulate the appropriate POST action and then redirect the user to the logged in page? Any help is appreciated. Thanks.

Read the article
How can I screen scrape with Perl?

- by Sakthivel

I need to display some values that are stored in a website, for that I need to scrape the website and fetch the content from the table. Any ideas?

Read the article
Download image file from the HTML page source using python?

- by Mohit Ranka

I am writing a scraper that downloads all the image files from a HTML page and saves them to a specific folder. all the images are the part of the HTML page.

Read the article
Setting up a python screen scraper that could work on Google App engine

- by cozza

I am looking to setup a automated screen scraper that will run on Google app engine using python. I want it to scrape the site and put the specified results into a Entity in app engine. I am looking for some directions on what to use. I have seen beautifulsoup but wonder if people could recommend anything else that could run on Google App engine.

Read the article
Extract anything that looks like links from large amount of data in python

- by Riz

Hi, I have around 5 GB of html data which I want to process to find links to a set of websites and perform some additional filtering. Right now I use simple regexp for each site and iterate over them, searching for matches. In my case links can be outside of "a" tags and be not well formed in many ways(like "\n" in the middle of link) so I try to grab as much "links" as I can and check them later in other scripts(so no BeatifulSoup\lxml\etc). The problem is that my script is pretty slow, so I am thinking about any ways to speed it up. I am writing a set of test to check different approaches, but hope to get some advices :) Right now I am thinking about getting all links without filtering first(maybe using C module or standalone app, which doesn't use regexp but simple search to get start and end of every link) and then using regexp to match ones I need.

Read the article
utf-8 convertion doesn't work always

- by Marco Piccinni

I searched into other stack before to type here and I didn't find anythong similar. I have to scrape different utf-8 webpages which contain text like "Oggi è una bellissima giornata" the problem is on the characther "è" I extract this text with jtidy and xpath query expression and I convert it with byte[] content = filteredEncodedString.getBytes("utf-8"); String result = new String(content,"utf-8"); where filteredEncodedString contains the text "Oggi è una bellissima giornata". This procedures works on the most webpages analyzed so far but in some case it doesn't extract a utf-8 string. Page encoding is always the same as the text is similar. Any ideas about the problem? thanks Marco

Read the article
What must I learn to parse dynamic HTML sites with PHP?

- by butteff

What I must to learn to write php web-site grabber (parser)? It must collect information from other websites, such as as weather forecast, wiki "on this day", some news and other useful and interesting "every day" information! what i must to read for writing m3u player on php? sorry for my bad english

Read the article
Web Scraper via Web Service API?

- by 001

How would I go about doing the following... I want to build a web service for my application to grab a piece of data from an external website, that requires the user to login. The website has no public API , hence the reason for the scrapper. Is there a library to perform the following functions? or what do I do? automate fill-in form, auto click Automate submit button check which URL the user has landed on, and redirect user to URL Grab data from label. EDIT: what im asking for is there a web service, library etc to make it easier to perform screen scrapping/automation functions???

Read the article
How to grab data on website?

- by Doug

So, often, I check my accounts for different numbers. For example, my affiliate accounts- i check for cash increase. I want to program a script where it can login to all these websiets and then grab the money value for me and display it on one page. How can I program this?

Read the article
is it possible to use a python scrapper in a website?

- by Tom

I want to scrap a website and use that content in a website of my own. I am just wondering if that can be done with python 2.7, and if so how? If not, do I have to use JavaScript to scrap it? And do you have a good place to learn how to do that or good libraries for it. For those of you wondering, the website I am scrapping is legal, and they allow for this to be done. I have searched all over but apparently nobody tries to implement these scrappers that they write. I can write a web scrapper in python just fine. Say my scrapper scraps a name from a wikipedia page (John Doe for example), how can I use that name that I get in my website? Another update, I have found pjsrape and PhantomJS. I have only found one stack overflow post and the github examples with aren't very intuitive. If anybody has any experience or better ways to do it I would very much appreciate it

Read the article
How can I take a screenshot of a website w/ .NET?

- by James Alexander

I'm looking for ideas on how to take screenshots of websites within a .NET application. This application will be a windows service. Thanks!

Read the article
getting text that will be displayed to user from html

- by gordatron

Bit of a random one, i am wanting to have a play with some NLP stuff and I would like to: Get all the text that will be displayed to the user in a browser from HTML. My ideal output would not have any tags in it and would only have fullstops (and any other punctuation used) and new line characters, though i can tolerate a fairly reasonable amount of failure in this (random other stuff ending up in output). If there was a way of inserting a newline or full stop in situations where the content was likely not to continue on then that would be considered an added bonus. e.g: items in an ul or option tag could be separated by full stops (or to be honest just ignored). I am working Java, but would be interested in seeing any code that does this. I can (and will if required) come up with something to do this, just wondered if there was anything out there like this already, as it would probably be better than what I come up with in an afternoon ;-). An example of the code I might write if I do end up doing this would be to use a SAX parser to find content in p tags, strip it of any span or strong etc tags, and add a full stop if I hit a div or another p without having had a fullstop. Any pointers or suggestions very welcome.

Read the article
How to extract the data from a website using java?

- by giri

Hi I am familier with java programming language I like to extract the data from a website and store it to my database running on my machine.Is that possible in java.If so which API I should use. For example the are number of schools listed on a website How can I extract that data and store it to my database using java.

Read the article
Programatticaly grabing text from a web page that is dynamically generated.

- by bstullkid

There is a website I am trying to pull information from in perl, however the section of the page I need is being generated using javascript so all you see in the source is <div id="results"></div> I need to somehow pull out the contents of that div and save it to a file using perl/proxies/whatever. e.g. the information I want to save would be document.getElementById('results').innerHTML; I am not sure if this is possible or if anyone had any ideas or a way to do this. I was using a lynx source dump for other pages but since I cant straight forward screen scrape this page I came here to ask about it!

Read the article
Web scrapping from a Google Chrome extension

- by limoragni

I've started to develop a Chrome extension to navigate and prform actions on a website. Until now the extension is able to receive a couple of parameters and check a set of radio-buttons, fill in a few inputs of a form and then submit it. What I want to do now is to repeat the process, but I'm stuck when the page is reloaded. And I don't know how can I do to make the script reacts to the finish of the request. The workflow I want to achieve is the following (is for automaticly copying a certain object): Popup side Enter the number of the Master object to copy Enter the base name of the copies (example Mod, so the I can iterate and add mod1, mod2, modn) Enter the number of copies Background side Select master Select standard options Fill in inputs Submit form Wait for the page to complete the request and continue to the next copy. (here I need help) The problem is on the repetition, the rest is taking care of. I assume that must be a way of dealing with requests. Any ideas? By the way I'm doing it all with the extension and tabs methods of google chrome plus javascript and jquery.

Read the article
Extract Data from a Website using PHP

- by 01jayss

I am trying to get PHP to extract the TOKEN (the uppercase one), USERID (uppercase), and the USER NAME (uppercase) from a web page with the following text. {"rsp":{"stat":"ok","auth":{"token":"**TOKEN**","perms":"read","user":{"id":"**USERID**","username":"**USER NAME**","fullname":"**NAME OF USER**"}}}} (This is from the RTM api, getting the authentication token of the user). How would I go about doing this? Thanks!

Read the article
How do I print the Images?

- by user1477539

I want to print the images of the 30 nba teams drafting in the first round. However when I tell it to print it prints out the link instead of the image. How do I get it to print out the image instead of giving me the image link. Here's my code: import urllib2 from BeautifulSoup import BeautifulSoup # or if your're using BeautifulSoup4: # from bs4 import BeautifulSoup soup = BeautifulSoup(urllib2.urlopen('http://www.cbssports.com/nba/draft/mock-draft').read()) rows = soup.findAll("table", attrs = {'class': 'data borderTop'})[0].tbody.findAll("tr")[2:] for row in rows: fields = row.findAll("td") if len(fields) >= 3: anchor = row.findAll("td")[1].find("a") if anchor: print anchor

Read the article
selenium, get text from id

- by user3766148

on the following url - http://www.filestube.to/26frq-Buffalo-Clover-Test-Your-Love-2014-9Jai9TJFukAS9fq9sWngAD.html I am trying to copy the; Direct links: turbobit.net/9mrb0eu9eksx/26frq.Buffalo.Clover..Test.Your.Love.2014.rar.html via css path or xpath and unable to retrieve the information and store it to a variable. firebug gives me html body div.cnt div.rH.no-js.fd div.rl div.fgBx pre span#copy_paste_links but when I apply css=html.body.div.cnt.div.rH.no-js.fd.div.rl.div.fgBx.pre.span#copy_paste_links/text() to the target, I get error not found http://i.imgur.com/KdBmDHE.png

Read the article
what is the best method or tool to scrape web sites ?

- by user63898

Hello all i need to scrape (with approval) web sites before I start to write my own what is the best tool/way to scrape web sites, which is both fast (multithreaded) and easy to learn?

Read the article

< Previous Page | 53 54 55 56 57 58 59 60 61 62 63 64 | Next Page >