utf-8 convertion doesn't work always
- by Marco Piccinni
I searched into other stack before to type here and I didn't find anythong similar.
I have to scrape different utf-8 webpages which contain text like
"Oggi è una bellissima giornata"
the problem is on the characther "è"
I extract this text with jtidy and xpath query expression and I convert it with
byte[] content = filteredEncodedString.getBytes("utf-8");
String result = new String(content,"utf-8");
where filteredEncodedString contains the text "Oggi è una bellissima giornata".
This procedures works on the most webpages analyzed so far but in some case it doesn't extract a utf-8 string. Page encoding is always the same as the text is similar.
Any ideas about the problem?
thanks
Marco