Text extraction with java html parsers
Posted
by zenmonkey
on Stack Overflow
See other posts from Stack Overflow
or by zenmonkey
Published on 2010-04-09T18:37:38Z
Indexed on
2010/04/09
18:53 UTC
Read the original article
Hit count: 540
I want to use an html parser that does the following in a nice, elegant way
- Extract text (this is most important)
- Extract links, meta keywords
- Reconstruct original doc (optional but nice feature to have)
From my investigation so far jericho seems to fit. Any other open source libraries you guys would recommend?
© Stack Overflow or respective owner