scraping - Page 8 - Developer IT

Possible to automate a web search?

- by ds1

Is it possible in a website search form to enter in series of searches? I have a list of destinations and would like to see if for each destination the search returns a result or throws an error.

what is the best method or tool to scrape web sites ?

- by user63898

Hello all i need to scrape (with approval) web sites before I start to write my own what is the best tool/way to scrape web sites, which is both fast (multithreaded) and easy to learn?

Programmatic Reaction to Receiving New Email

I'm interested in automating some reactive work I do when receiving certain emails in one of my email accounts. What I would like to have happen is: On receipt of new email in the account If the new email passes the "Need to React" criteria (based on body content and subject line) 3a. Scrape some content out of the email body and subject lines 3b. Populate a template form (e.g. Excel spreadsheet) with the scraped data 3c. Print the populated form and save the populated form in some folder (e.g. as a pdf) What's the best (defined as easiest to implement by myself) approach / combination of technologies for achieving this automation?

Read the article

Extract Data from a Website using PHP

- by 01jayss

I am trying to get PHP to extract the TOKEN (the uppercase one), USERID (uppercase), and the USER NAME (uppercase) from a web page with the following text. {"rsp":{"stat":"ok","auth":{"token":"**TOKEN**","perms":"read","user":{"id":"**USERID**","username":"**USER NAME**","fullname":"**NAME OF USER**"}}}} (This is from the RTM api, getting the authentication token of the user). How would I go about doing this? Thanks!

Read the article

How do I screen scrape a website and get data within div?

- by user272899

How can I screen scrape a website using cURL and show the data within a specific div?

Read the article

selenium, get text from id

- by user3766148

on the following url - http://www.filestube.to/26frq-Buffalo-Clover-Test-Your-Love-2014-9Jai9TJFukAS9fq9sWngAD.html I am trying to copy the; Direct links: turbobit.net/9mrb0eu9eksx/26frq.Buffalo.Clover..Test.Your.Love.2014.rar.html via css path or xpath and unable to retrieve the information and store it to a variable. firebug gives me html body div.cnt div.rH.no-js.fd div.rl div.fgBx pre span#copy_paste_links but when I apply css=html.body.div.cnt.div.rH.no-js.fd.div.rl.div.fgBx.pre.span#copy_paste_links/text() to the target, I get error not found http://i.imgur.com/KdBmDHE.png

Read the article

How do I print the Images?

- by user1477539

I want to print the images of the 30 nba teams drafting in the first round. However when I tell it to print it prints out the link instead of the image. How do I get it to print out the image instead of giving me the image link. Here's my code: import urllib2 from BeautifulSoup import BeautifulSoup # or if your're using BeautifulSoup4: # from bs4 import BeautifulSoup soup = BeautifulSoup(urllib2.urlopen('http://www.cbssports.com/nba/draft/mock-draft').read()) rows = soup.findAll("table", attrs = {'class': 'data borderTop'})[0].tbody.findAll("tr")[2:] for row in rows: fields = row.findAll("td") if len(fields) >= 3: anchor = row.findAll("td")[1].find("a") if anchor: print anchor

Read the article

Asp.Net Scrapping Grid Pages

- by SH

I need cod in C#. Look, i am trying to post the search.aspx page which contains Asp.Net grid. When grid is rendered it loads very first page on the screen and then there are number of pages in the grid header. I scrap first page, and now i want to move on to the next page. All this is being done using following code: HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create("http://pubrec3.hillsclerk.com/oncore/search.aspx?" + param); myRequest.Method = "GET"; myRequest.KeepAlive = false; HttpWebResponse webresponse; try { webresponse = (HttpWebResponse)myRequest.GetResponse(); Encoding enc = System.Text.Encoding.GetEncoding(1252); StreamReader loResponseStream = new StreamReader(webresponse.GetResponseStream(), enc); string r = loResponseStream.ReadToEnd(); loResponseStream.Close(); webresponse.Close(); //if (GetRecordCount(r)) ExtractResultTable(bd, ed, r); } catch (Exception ex) { } The above code grabs first page... and now i have to move on to the next page. This is the link which produces a grid with 3 pages. http://pubrec3.hillsclerk.com/oncore/search.aspx?bd=01/01/2008&ed=12/31/2008&bt=O&lb=1000000&ub=1000000000&d=5/6/2010&pt=-1&dt=D,%20MTG&st=consideration Using above code i need to load the 2nd page with the same search criteria but the records found in 2nd page. and then so on. I know there is a trick to navigate through the grid pages. I used it but it did not work on this page. the trick is, you can pass __EVENTTARGET and __EVENTARGUMENT in query string to navigate through the gird but it does not work on this website. I am desperately searching a way, how to cope with this website. i really want this done. i do not want any code but a way to nevigate throgh the grid using query string if it does work. Otherwise please be specific to the problem.

Read the article

change label text from a VB6 binary (not source code)

- by Jun

Hi, we have a VB6 binary executable that comes with no source code. And we need to change the label text for that VB6 application from "AAA" to "BBB". Is there any way or tools that can do that? The closest tool I can find right now is microsoft UISpy, it can read all the other elements but not the label. I hope there is a tool that can change the resource in the .exe so that the label "AAA" will read "BBB". Or is it possible to write a wrapper application, it will launch the .exe, examine the application screen for "AAA" and change that to "BBB"? Thank you for your help!

Read the article

Web scrapping from a Google Chrome extension

- by limoragni

I've started to develop a Chrome extension to navigate and prform actions on a website. Until now the extension is able to receive a couple of parameters and check a set of radio-buttons, fill in a few inputs of a form and then submit it. What I want to do now is to repeat the process, but I'm stuck when the page is reloaded. And I don't know how can I do to make the script reacts to the finish of the request. The workflow I want to achieve is the following (is for automaticly copying a certain object): Popup side Enter the number of the Master object to copy Enter the base name of the copies (example Mod, so the I can iterate and add mod1, mod2, modn) Enter the number of copies Background side Select master Select standard options Fill in inputs Submit form Wait for the page to complete the request and continue to the next copy. (here I need help) The problem is on the repetition, the rest is taking care of. I assume that must be a way of dealing with requests. Any ideas? By the way I'm doing it all with the extension and tabs methods of google chrome plus javascript and jquery.

Read the article

How to script a URL screenshot without X?

- by cwopen

I'd like to automate a 'screenshot' of arbitary URL's using a Linux build that doesn't have X installed. There appears to be some (costly) web services to do this, but I specifically want something I can do locally. Tried imagemagick without much success, though Mozilla used to have a command line option to do it?

Read the article

Extract Address Information from a Web Page

- by Brian Boatright

I need to take a web page and extract the address information from the page. Some are easier than others. I'm looking for a firefox plugin, windows app, or VB.NET code that will help me get this done. Ideally I would like to have a web page on our admin (ASP.NET/VB.NET) where you enter a URL and it scraps the page and returns a Dataset that I can put in a Grid.

Read the article

How to know if the website being scraped has changed?

- by Lost_in_code

I'm using PHP to scrape a website and collect some data. It's all done without using regex. I'm using php's explode() method to find particular HTML tags instead. It is possible that if the structure of the website changes (CSS, HTML), then wrong data may be collected by the scraper. So the question is - how do I know if the HTML structure has changed? How to identify this before storing any data to my database to avoid wrong data being stored.

Read the article

Parse livescores from web site

- by Venno

Hi all, I was thinking of parsing live scores from a web site via PHP and them use them for an application I am planning to implement, so my question is is it legal to do that, parse info from web site and use it ? If I quote the source if the info ?

Read the article

Python lxml - returns null list

- by Chris Finlayson

I cannot figure out what is wrong with the XPATH when trying to extract a value from a webpage table. The method seems correct as I can extract the page title and other attributes, but I cannot extract the third value, it always returns an empty list? from lxml import html import requests test_url = 'SC312226' page = ('https://www.opencompany.co.uk/company/'+test_url) print 'Now searching URL: '+page data = requests.get(page) tree = html.fromstring(data.text) print tree.xpath('//title/text()') # Get page title print tree.xpath('//a/@href') # Get href attribute of all links print tree.xpath('//*[@id="financial"]/table/tbody/tr/td[1]/table/tbody/tr[2]/td[1]/div[2]/text()') Unless i'm missing something, it would appear the XPATH is correct: Chrome screenshot I checked Chrome console, appears ok! So i'm at a loss $x ('//*[@id="financial"]/table/tbody/tr/td[1]/table/tbody/tr[2]/td[1]/div[2]/text()') [ "£432,272" ]

Read the article

Possible to use Javascript to access the client side's network(knowingly)

- by Earlz

I recently found an exploit in my router to basically give me root access. The catch? There is a nonce hidden form value that is randomly generated and must be sent in for it to work that makes it difficult to do "easily" So basically I'm wanting to do something like this in javascript: get http://192.168.1.254/blah use a regex or similar to extract the nonce value put the nonce value into a hidden field in the current page submit the form by POST to http://192.168.1.254/blah complete with the nonce value and other form values I want to send in. Is this at all possible using only HTML and Javascript? I'm open to things like "must save HTML file locally and then open", which I'm thinking is one way around the cross domain policy. But anyway, is this at all possible? I'm hoping for this to be able to run from at least Firefox and Chrome. The audience for this is those with some technical know how.

Read the article

How to extract images from flash viewers?

- by RC

This deals with the (diverse) flash viewers that let you zoom in on images on websites. I’m trying to extract the large, zoomed-in image rendered by the viewer. In many cases the images seem to be dynamically called by the viewer, or are created only for the part of the image you are zooming on at that point. Ideally, the approach here would be a programmatic one that could be called on an identified flash element. Expect there is nothing universal, but interested in the top few approaches that will grab most cases.

Read the article

How to "scan" a website (or page) for info, and bring it into my program?

- by James

Well, I'm pretty much trying to figure out how to pull information from a webpage, and bring it into my program (in Java). For example, if I know the exact page I want info from, for the sake of simplicity a Best Buy item page, how would I get the appropriate info I need off of that page? Like the title, price, description? What would this process even be called? I have no idea were to even begin researching this.

Read the article

External HD makes scraping sound when idle. How to disable it?

- by StackedCrooked

My external hard disk often starts making an irritating scraping sound. It stops if I do something that causes disk activity on it. But then, after 3 minutes or so, it starts again. It's very annoying. Is there a way to disable it? It's a Western Digital WD 10EAVS.

Read the article

Is this Anti-Scraping technique viable with Crawl-Delay?

- by skibulk

I want to prevent web scrapers from abusing 1,000,000 on my website. I'd like to do this by returning a "503 Service Unavailable" error code for users that access an abnormal number of pages per minute. I don't want search engine spiders to ever receive the error. My inclination is to set a robots.txt crawl-delay which will ensure spiders access a number of pages per minute under my 503 threshold. Is this an appropriate solution? Do all major search engines support the directive? Could it negatively affect SEO? Are there any other solutions or recommendations?

Read the article

any good books or academic papers on web scraping or web spiders?

- by gpow

any good books or academic papers on web scraping or web spiders?

Read the article

HELP!!! Upgrading to windows 2008 R2 server has caused major issue with screen scraping remote serve

- by bobsov534

I have three servers. One is Windows 2003(A) and another is Windows 2008(B) and third one is also Windows 2008 (C). All of them are web servers. A and B contains classic asp pages and they are 32 bit servers and C contains asp.net pages and is 64 bit server. The asp pages of A and B use the screen scraping technology to render the asp.net pages from C. When A is ran, the asp.net page is rendered fine, meaning there are no broken images or file not found error. When B is ran, the images appears to be broken because its looking for those images in Server B instead of Server C itself. I believe this issue is caused by IIS 7 or 7.5 since IIS 6 has no problem scraping the remote server pages. Can you please help me with solution to this problem? This is sort of urgent since upgrading to windows server 2008 R2 now has been a major show stopper for us at the moment. Thanks in advance.

Read the article

Web scraping advice/help with java for android app!

- by Capsud

Hey there, I've heard about web scraping software that can take data from a webpage. i'm building an android app and I want to take information from this site www.menupages.ie All I need is the names of the restaurants, and typing them in myself would be very tedious. Can someone tell me how i'd go about doing this in eclipse, what methods i need etc. I dont know anything about it. Thanks alot.

Read the article

Anyone have a good solution for scraping the HTML source of a page with content (in this case, HTML

- by phpwns

Anyone have a good solution for scraping the HTML source of a page with content (in this case, HTML tables) generated with Javascript? An embarrassingly simple, though workable solution using Crowbar: <?php function get_html($url) // $url must be urlencode(d) { $context = stream_context_create(array( 'http' => array('timeout' => 120) // HTTP timeout in seconds )); $html = substr(file_get_contents('http://127.0.0.1:10000/?url=' . $url . '&delay=3000&view=browser', 0, $context), 730, -32); // substr removes HTML from the Crowbar web service, returning only the $url HTML return $html; } ?> The advantage to using Crowbar is that the tables will be rendered (and accessible) thanks to the headless mozilla-based browser. The problem, of course, is being dependent on on an external web service, especially given that SIMILE seems to undergo regular server maintenance. :( A pure php solution would be nice, but any functional (and reliable) alternatives would be great.

Read the article

Use the Django ORM in a standalone script (again)

- by Rishabh Manocha

I'm trying to use the Django ORM in some standalone screen scraping scripts. I know this question has been asked before, but I'm unable to figure out a good solution for my particular problem. I have a Django project with defined models. What I would like to do is use these models and the ORM in my scraping script. My directory structure is something like this: project scrape #scraping scripts ... test.py web django_project settings.py ... #Django files I tried doing the following in project/scrape/test.py: print os.path.join(os.path.abspath('..'), 'web', 'django_project') sys.path.append(os.path.join(os.path.abspath('..'), 'web', 'django_project')) print sys.path print "-------" os.environ['DJANGO_SETTINGS_MODULE'] = 'django_project.settings' #print os.environ from django_project.myapp.models import MyModel print MyModel.objects.count() However, I get an ImportError when I try to run test.py: Traceback (most recent call last): File "test.py", line 12, in <module> from django_project.myapp.models import MyModel ImportError: No module named django_project.myapp.models One solution I found around this problem is to create a symbolic link to ../web/govcheck in the scrape folder: :scrape rmanocha$ ln -s ../web/govcheck ./govcheck With this, I can then run test.py just fine. However, this seems like a hack, and more importantly, is not very portable (I will have to create this symbolic link everywhere I run this code). So, I was wondering if anyone has any better solutions for my problem?

Search Results

Search found 346 results on 14 pages for 'scraping'.

Page 8/14 | < Previous Page | 4 5 6 7 8 9 10 11 12 13 14 | Next Page >

- by ds1

- by user63898

- by vicatcu

- by 01jayss

- by user272899

- by user3766148

- by user1477539

- by SH

- by Jun

- by limoragni

- by cwopen

- by Brian Boatright

- by Lost_in_code

- by Venno

- by Chris Finlayson

- by Earlz

- by RC

- by James

- by StackedCrooked

- by skibulk

- by gpow

- by bobsov534

- by Capsud

- by phpwns

- by Rishabh Manocha

< Previous Page | 4 5 6 7 8 9 10 11 12 13 14 | Next Page >