remove non-UTF-8 characters from xml with declared encoding=utf-8 - Java
Posted
by St Nietzke
on Stack Overflow
See other posts from Stack Overflow
or by St Nietzke
Published on 2010-05-19T20:19:03Z
Indexed on
2010/05/19
20:20 UTC
Read the original article
Hit count: 284
I have to handle this scenario in Java:
I'm getting a request in XML form from a client with declared encoding=utf-8. Unfortunately it may contain not utf-8 characters and there is a requirement to remove these characters from the xml on my side (legacy).
Let's consider an example where this invalid XML contains £ (pound).
1) I get xml as java String with £ in it (I don't have access to interface right now, but I probably get xml as a java String). Can I use replaceAll(£, "") to get rid of this character? Any potential issues?
2) I get xml as an array of bytes - how to handle this operation safely in that case?
© Stack Overflow or respective owner