Why does printf report an error on all but three (ASCII-range) Unicode Codepoints, yet is fine with all others?

Posted by fred.bear on Ask Ubuntu See other posts from Ask Ubuntu or by fred.bear
Published on 2011-01-09T21:19:24Z Indexed on 2011/01/09 21:59 UTC
Read the original article Hit count: 248

Filed under:

bash

|

scripts

|

unicode

The 'printf' I refer to is the standard-issue "program" (not the built-in): /usr/bin/printf

I was testing printf out as a viable method of convert a Unicode Codepoint Hex-literal into its Unicoder character representation,

I was looking good, and seemed flawless..(btw. the built-in printf can't do this at all (I think)...

I then thought to test it at the lower extreme end of the code-spectrum, and it failed with an avalanche of errors.. All in the ASCII range (= 7 bits)

The strangest thing was that 3 value printed normally; they are:

$ \u0024
@ \u0040
` \u0060

I'd like to know what is going on here. The ASCII character-set is most definitely part of the Unicode Code-point sequence....

I am puzzled, and still without a good way to bash script this particular converion.. Suggestions are welcome.

To be entertained by that same avalanche of errors, paste the following code into a terminal...

# Here is one of the error messages
# /usr/bin/printf: invalid universal character name \u0041
#  ...for them all, run the following script
( 
for nib1 in {0..9} {A..F}; do 
  for nib0 in {0..9} {A..F}; do
   [[ $nib1 < A ]] && nl="\n" || nl=" "
   $(type -P printf) "\u00$nib1$nib0$nl"
  done
done 
echo 
)

Developer IT