Java library for HTML analysis
Posted
by Raj
on Stack Overflow
See other posts from Stack Overflow
or by Raj
Published on 2010-01-27T06:24:38Z
Indexed on
2010/03/13
11:25 UTC
Read the original article
Hit count: 562
Hi, (I've seen similar questions, but I think none of them cater to my specific needs, hence...)
I would like to know if there is a Java library for analysis of real-world (read: incomplete, ill-formed) HTML. By analysis, I mean things like:
- figuring out the most prominent color in an HTML chunk
- changing that color to some other color (hence, has to support modification of the HTML as well)
- pruning out unwanted tags
- fixing up the HTML to result in a well formed HTML snippet
Parts of the last two are done by libraries such as Jericho, and jTidy. 'Plugins' on top of these would be great.
Thanks in advance!
© Stack Overflow or respective owner