remove non-UTF-8 characters from xml with declared encoding=utf-8 - Java
- by St Nietzke
I have to handle this scenario in Java:
I'm getting a request in XML form from a client with declared encoding=utf-8. Unfortunately it may contain not utf-8 characters and there is a requirement to remove these characters from the xml on my side (legacy).
Let's consider an example where this invalid XML contains £ (pound).
1) I get xml as java String with £ in it (I don't have access to interface right now, but I probably get xml as a java String). Can I use replaceAll(£, "") to get rid of this character? Any potential issues?
2) I get xml as an array of bytes - how to handle this operation safely in that case?