utf-8 convertion doesn't work always

Posted by Marco Piccinni on Stack Overflow See other posts from Stack Overflow or by Marco Piccinni
Published on 2012-09-11T15:36:21Z Indexed on 2012/09/11 15:38 UTC
Read the original article Hit count: 205

Filed under:
|
|

I searched into other stack before to type here and I didn't find anythong similar. I have to scrape different utf-8 webpages which contain text like

"Oggi è una bellissima giornata"

the problem is on the characther "è"

I extract this text with jtidy and xpath query expression and I convert it with

byte[] content = filteredEncodedString.getBytes("utf-8");
String result = new String(content,"utf-8");

where filteredEncodedString contains the text "Oggi è una bellissima giornata". This procedures works on the most webpages analyzed so far but in some case it doesn't extract a utf-8 string. Page encoding is always the same as the text is similar.

Any ideas about the problem? thanks

Marco

© Stack Overflow or respective owner

Related posts about java

Related posts about utf-8