scrapping blog contents

Posted by goh on Stack Overflow See other posts from Stack Overflow or by goh
Published on 2010-06-17T02:41:02Z Indexed on 2010/06/17 2:42 UTC
Read the original article Hit count: 223

Filed under:

Hi lads,

After obtaining the urls for various blogspots, tumblr and wordpress pages, I faced some problems processing the html pages. The thing is, i wish to distinguish between the content,title and date for each blog post. I might be able to get the date through regex, but there are so many custom scripts people are using now that the html classes and structure is so different.

Does anyone has a solution that may help?

© Stack Overflow or respective owner

Related posts about python