utf-8 convertion doesn't work always
Posted
by
Marco Piccinni
on Stack Overflow
See other posts from Stack Overflow
or by Marco Piccinni
Published on 2012-09-11T15:36:21Z
Indexed on
2012/09/11
15:38 UTC
Read the original article
Hit count: 205
I searched into other stack before to type here and I didn't find anythong similar. I have to scrape different utf-8 webpages which contain text like
"Oggi è una bellissima giornata"
the problem is on the characther "è"
I extract this text with jtidy and xpath query expression and I convert it with
byte[] content = filteredEncodedString.getBytes("utf-8");
String result = new String(content,"utf-8");
where filteredEncodedString contains the text "Oggi è una bellissima giornata". This procedures works on the most webpages analyzed so far but in some case it doesn't extract a utf-8 string. Page encoding is always the same as the text is similar.
Any ideas about the problem? thanks
Marco
© Stack Overflow or respective owner