scrapping blog contents
Posted
by goh
on Stack Overflow
See other posts from Stack Overflow
or by goh
Published on 2010-06-17T02:41:02Z
Indexed on
2010/06/17
2:42 UTC
Read the original article
Hit count: 223
python
Hi lads,
After obtaining the urls for various blogspots, tumblr and wordpress pages, I faced some problems processing the html pages. The thing is, i wish to distinguish between the content,title and date for each blog post. I might be able to get the date through regex, but there are so many custom scripts people are using now that the html classes and structure is so different.
Does anyone has a solution that may help?
© Stack Overflow or respective owner