Java library for HTML analysis
- by Raj
Hi,
(I've seen similar questions, but I think none of them cater to my specific needs, hence...)
I would like to know if there is a Java library for analysis of real-world (read: incomplete, ill-formed) HTML. By analysis, I mean things like:
figuring out the most prominent color in an HTML chunk
changing that color to some other color (hence, has to support modification of the HTML as well)
pruning out unwanted tags
fixing up the HTML to result in a well formed HTML snippet
Parts of the last two are done by libraries such as Jericho, and jTidy. 'Plugins' on top of these would be great.
Thanks in advance!