Java library for HTML analysis

Posted by Raj on Stack Overflow See other posts from Stack Overflow or by Raj
Published on 2010-01-27T06:24:38Z Indexed on 2010/03/13 11:25 UTC
Read the original article Hit count: 562

Filed under:
|
|

Hi, (I've seen similar questions, but I think none of them cater to my specific needs, hence...)

I would like to know if there is a Java library for analysis of real-world (read: incomplete, ill-formed) HTML. By analysis, I mean things like:

  • figuring out the most prominent color in an HTML chunk
  • changing that color to some other color (hence, has to support modification of the HTML as well)
  • pruning out unwanted tags
  • fixing up the HTML to result in a well formed HTML snippet

Parts of the last two are done by libraries such as Jericho, and jTidy. 'Plugins' on top of these would be great.

Thanks in advance!

© Stack Overflow or respective owner

Related posts about html

Related posts about java