'??' Not a valid unicode character, but in the unicode character set?
- by Steve Cotner
Short story: I can't get an entity like '𠂉' to store in a MySQL database, either by using a text field in a Ruby on Rails app (with default UTF-8 encoding) or by inputting it directly with a MySQL GUI app.
As far as I can tell, all Chinese characters and radicals can be entered into the database without problem, but not these rarely typed 'character components.' The character mentioned above is unicode U+20089 and html entity 𠂉
I can get it to display on the page by entering <html>𠂉</html> and removing html escaping, but I would like to store it simply as the unicode character and keep the html escaping in place. There are many other Chinese 'components' (parts of full characters, generally consisting of 2 or 3 strokes) that cause the same problem.
According to this page, the character mentioned is in the UTF-8 charset: http://www.fileformat.info/info/unicode/char/20089/charset_support.htm
But on the neighboring '...20089/index.htm' page, there's an alert saying it's not a valid unicode character.
For reference, that entity can be found in Mac OS X by searching through the character palette (international menu, "Show Character Palette"), searching by radical, and looking under the '?' radical.
Apologies if this is too open-ended... can a character like this be stored in a UTF-8-based database? How is this character both supported and unsupported, both present in the character set and not valid?