How to know if the website being scraped has changed?
Posted
by Lost_in_code
on Stack Overflow
See other posts from Stack Overflow
or by Lost_in_code
Published on 2010-03-27T17:52:13Z
Indexed on
2010/03/27
17:53 UTC
Read the original article
Hit count: 163
I'm using PHP to scrape a website and collect some data. It's all done without using regex. I'm using php's explode() method to find particular HTML tags instead.
It is possible that if the structure of the website changes (CSS, HTML), then wrong data may be collected by the scraper. So the question is - how do I know if the HTML structure has changed? How to identify this before storing any data to my database to avoid wrong data being stored.
© Stack Overflow or respective owner