getTextContent from Node with whitespace character normalization
- by Nayn
Hi,
I am working with XPATH, Java and want to extract some text out of one html page.
The text is located under some div with some whitespace characters in between, like <br> etc.
I want these to be converted into 'space' and 'newline' respectively while extracting.
The method I am using to extract text is Element.getTextContent() which does not respect whitespace characters.
Could somebody tell me if there is a way to extract text with whitespace normalization
OR
Extract whole html markup under the 'Node' so that i could replace it by myself.
Thanks
Nayn