remove non-UTF-8 characters from xml with declared encoding=utf-8 - Java

Posted by St Nietzke on Stack Overflow See other posts from Stack Overflow or by St Nietzke
Published on 2010-05-19T20:19:03Z Indexed on 2010/05/19 20:20 UTC
Read the original article Hit count: 284

Filed under:
|
|
|

I have to handle this scenario in Java:

I'm getting a request in XML form from a client with declared encoding=utf-8. Unfortunately it may contain not utf-8 characters and there is a requirement to remove these characters from the xml on my side (legacy).

Let's consider an example where this invalid XML contains £ (pound).

1) I get xml as java String with £ in it (I don't have access to interface right now, but I probably get xml as a java String). Can I use replaceAll(£, "") to get rid of this character? Any potential issues?

2) I get xml as an array of bytes - how to handle this operation safely in that case?

© Stack Overflow or respective owner

Related posts about java

Related posts about Xml