Scraping html WITHOUT uniquie identifiers using python

Posted by Nicholas Law on Stack Overflow See other posts from Stack Overflow or by Nicholas Law
Published on 2013-10-22T21:43:36Z Indexed on 2013/10/22 21:53 UTC
Read the original article Hit count: 193

Filed under:
|

I would like to design an algorithm using python that scrapes thousands of pages like this one and this one, gathers all the data and inserts it into a MySQL database. The script will be run on a weekly or bi-weekly basis to update the database of any new information added to each individual page.

Ideally I would like a scraper that is easy to work with for table structured data but also data that does not have unique identifiers (ie. id and classes attributes).

Which scraper add-on should I use? BeautifulSoup, Scrapy or Mechanize?

Are there any particular tutorials/books I should be looking at for this desired result?

In the long-run I will be implementing a mobile app that works with all this data through querying the database.

© Stack Overflow or respective owner

Related posts about python

Related posts about mysql