getTextContent from Node with whitespace character normalization

Posted by Nayn on Stack Overflow See other posts from Stack Overflow or by Nayn
Published on 2010-05-21T12:20:23Z Indexed on 2010/05/21 13:40 UTC
Read the original article Hit count: 240

Filed under:
|
|

Hi, I am working with XPATH, Java and want to extract some text out of one html page. The text is located under some div with some whitespace characters in between, like &nbsp; <br> etc. I want these to be converted into 'space' and 'newline' respectively while extracting. The method I am using to extract text is Element.getTextContent() which does not respect whitespace characters.

Could somebody tell me if there is a way to extract text with whitespace normalization OR Extract whole html markup under the 'Node' so that i could replace it by myself. Thanks Nayn

© Stack Overflow or respective owner

Related posts about java

Related posts about xpath