Text extraction with java html parsers
- by zenmonkey
I want to use an html parser that does the following in a nice, elegant way
Extract text (this is most important)
Extract links, meta keywords
Reconstruct original doc (optional but nice feature to have)
From my investigation so far jericho seems to fit. Any other open source libraries you guys would recommend?