Java HTML parser/validator
- by at
We allow people to enter HTML code on our wiki-like site. But only a limited subset of HTML to not affect our styling and not allow malicious javascript code. Is there a good Java library on the server side to ensure that the code entered is valid?
We tried creating an XML Schema document to validate against. The only issue there is the libraries we used to validate gave back cryptic error messages. What I want is for the validation library to actually fix the issue (if there was a style="" attribute added to an element, remove it). If fixing it is not easy, at least allow me to report a message to the user with the location of the error (an error code that I can present a nice message from is fine, probably even preferable).