Getting data from a webpage in a stable and efficient way
- by Mike Heremans
Recently I've learned that using a regex to parse the HTML of a website to get the data you need isn't the best course of action.
So my question is simple: What then, is the best / most efficient and a generally stable way to get this data?
I should note that:
There are no API's
There is no other source where I can get the data from (no databases, feeds and such)
There is no access to the source files. (Data from public websites)
Let's say the data is normal text, displayed in a table in a html page
I'm currently using python for my project but a language independent solution/tips would be nice.
As a side question: How would you go about it when the webpage is constructed by Ajax calls?