Validate Unicode String and Escape if Unicode is Invalid (C/C++)
- by vy32
I have a program that reads arbitrary data from a file system and outputs results in Unicode. The problem I am having is that sometimes filenames are valid Unicode and sometimes they aren't. So I want a function that can validate a string (in C or C++) and tell me if it is a valid UTF-8 encoding. If it is not, I want to have the invalid characters escaped so that it will be a valid UTF-8 encoding. This is different than escaping for XML --- I need to do that also. But first I need to be sure that the Unicode is right.
I've seen some code from which I could hack this, but I would rather use some working code if it exists.