getTextContent from Node with whitespace character normalization
Posted
by Nayn
on Stack Overflow
See other posts from Stack Overflow
or by Nayn
Published on 2010-05-21T12:20:23Z
Indexed on
2010/05/21
13:40 UTC
Read the original article
Hit count: 234
Hi,
I am working with XPATH, Java and want to extract some text out of one html page.
The text is located under some div with some whitespace characters in between, like
<br>
etc.
I want these to be converted into 'space' and 'newline' respectively while extracting.
The method I am using to extract text is Element.getTextContent() which does not respect whitespace characters.
Could somebody tell me if there is a way to extract text with whitespace normalization OR Extract whole html markup under the 'Node' so that i could replace it by myself. Thanks Nayn
© Stack Overflow or respective owner