Getting a "summary" of a webpage
Posted
by MattiasK
on Stack Overflow
See other posts from Stack Overflow
or by MattiasK
Published on 2010-05-31T05:10:59Z
Indexed on
2010/05/31
5:22 UTC
Read the original article
Hit count: 214
I have something of a a hairy problem, I'd like to generate a couple of paragraphs of "description" of a given url, normally the start of an article. The Meta description field is one way to go but it isn't always good or set properly.
It's fair to say it's a bit problematic to accomplish this from the screenscraped HTML. I had a general idea that perhaps one could scan the HTML for the first "appropriate" segment but it's hard to say what that is, perhaps something like the first paragraph containing a certain amount of text...
Anyone have any good ideas? :) It doesn't have to be foolproof
© Stack Overflow or respective owner