need help working with the Jericho Html Parser

Posted by rookie on Stack Overflow See other posts from Stack Overflow or by rookie
Published on 2010-04-17T17:09:20Z Indexed on 2010/04/17 17:13 UTC
Read the original article Hit count: 585

Hi all I've simply used the following program on the url below

http://jericho.htmlparser.net/samples/console/src/ExtractText.java

My goal is to be able to extract the main body text, to be able to summarize it and present the summarized text as output to the user.

My problem is that, I'm not sure how I'd modify the above program to only get the required text from the webpage, without the links or any other information.

Again, I'd really appreciate any help I could get.

Thanks in advance

© Stack Overflow or respective owner

Related posts about html

Related posts about htmlparsing