How to support 3-byte UTF-8 Characters in ANT
- by efelton
I am trying to support UTF-8 characters in my ANT script.
As long as the character string are made up of 2-byte UTF-8 characters, such as:
Lògìñ
Ùsèr ÌÐ
Then things work fine.
When I use Unicode Han Character:
?
Which, according to this site:
http://www.fileformat.info/info/unicode/char/6211/index.htm
Has a UTF-8 encoding of 0xE6 0x88 0x91
I can see in UltraEdit, my input properties file has the values "E6 88 91" all in a row, so I'm fairly confident that my input is correct. And When I open the same file in Notepad++ I can see all the characters correctly.
Here is my Build Script:
<?xml version="1.0" encoding="UTF-8" ?>
<project
name="utf8test"
default="all"
basedir=".">
<target name="all">
<loadproperties encoding="UTF-8" srcfile="./apps.properties.all.txt" />
<echo>No encoding ${common.app.name}</echo>
<echo encoding="UTF-8">UTF-8 ${common.app.name}</echo>
<echo encoding="UnicodeLittle">UnicodeLittle ${common.app.name}</echo>
<echo encoding="UnicodeLittleUnmarked">UnicodeLittleUnmarked ${common.app.name}</echo>
<echo>${common.app.ServerName}</echo>
<echo>${bb.vendor}</echo>
<echo>No encoding ${common.app.UserIdText}</echo>
<echo encoding="UTF-8">UTF-8 ${common.app.UserIdText}</echo>
<echo encoding="UnicodeLittle">UnicodeLittle ${common.app.UserIdText}</echo>
<echo encoding="UnicodeLittleUnmarked">UnicodeLittleUnmarked ${common.app.UserIdText}</echo>
<echoproperties />
</target>
</project>
And here is my properties file:
common.app=VrvPsLTst
common.app.name=??
common.app.description=Pseudo Loc Test App for Build Script testing
common.app.ServerName=http://Vèrìvò.com
bb.vendor=Vèrìvò
common.app.PasswordText=Pàsswòrð
bb.override.list=MP_COPYRIGHTTEXT, "Çòpÿrìght 2012 Vèrívó Bùîlð TéàM"
common.app.LoginButtonText=Lògìñ
common.app.UserIdText=Ùsèr ÌÐ
bb.SMSSuccess=Mèssàgéß Sùççêssfúllÿ Sëñt
common.app.LoginScreenMessage=WèlçòMé Mêssàgë
common.app.LoginProgressMessage=Àùthèñtìçàtíòñ îñ prógréss...
ios.RegistrationText=Règìstràtíòñ Téxt
ios.RegistrationURL=http://www.josscrowcroft.com/2011/code/utf-8-multibyte-characters-in-url-parameters-%E2%9C%93/
Here is what the output looks like:
Buildfile: C:\Temp\utf8\build.xml
all:
[echo] No encoding ??
[echo] UTF-8 ??
[echo] ÿþU n i c o d e L i t t l e ? ?
[echo] U n i c o d e L i t t l e U n m a r k e d ? ?
[echo] http://Vèrìvò.com
[echo] Vèrìvò
[echo] No encoding Ùsèr ÌÐ
[echo] UTF-8 Ùsèr Ì�
[echo] ÿþU n i c o d e L i t t l e Ù s è r Ì Ð
[echo] U n i c o d e L i t t l e U n m a r k e d Ù s è r Ì Ð
[echoproperties] #Ant properties
[echoproperties] #Mon Jun 18 15:25:13 EDT 2012
[echoproperties] ant.core.lib=C\:\\ant\\lib\\ant.jar
[echoproperties] ant.file=C\:\\Temp\\utf8\\build.xml
[echoproperties] ant.file.type=file
[echoproperties] ant.file.type.utf8test=file
[echoproperties] ant.file.utf8test=C\:\\Temp\\utf8\\build.xml
[echoproperties] ant.home=c\:\\ant\\bin\\..
[echoproperties] ant.java.version=1.6
[echoproperties] ant.library.dir=C\:\\ant\\lib
[echoproperties] ant.project.default-target=all
[echoproperties] ant.project.invoked-targets=all
[echoproperties] ant.project.name=utf8test
[echoproperties] ant.version=Apache Ant version 1.8.1 compiled on April 30 2010
[echoproperties] awt.toolkit=sun.awt.windows.WToolkit
[echoproperties] basedir=C\:\\Temp\\utf8
[echoproperties] bb.SMSSuccess=M\u00E8ss\u00E0g\u00E9\u00DF S\u00F9\u00E7\u00E7\u00EAssf\u00FAll\u00FF S\u00EB\u00F1t
[echoproperties] bb.override.list=MP_COPYRIGHTTEXT, "\u00C7\u00F2p\u00FFr\u00ECght 2012 V\u00E8r\u00EDv\u00F3 B\u00F9\u00EEl\u00F0 T\u00E9\u00E0?"
[echoproperties] bb.vendor=V\u00E8r\u00ECv\u00F2
[echoproperties] common.app=VrvPsLTst
[echoproperties] common.app.LoginButtonText=L\u00F2g\u00EC\u00F1
[echoproperties] common.app.LoginProgressMessage=\u00C0\u00F9th\u00E8\u00F1t\u00EC\u00E7\u00E0t\u00ED\u00F2\u00F1 \u00EE\u00F1 pr\u00F3gr\u00E9ss...
[echoproperties] common.app.LoginScreenMessage=W\u00E8l\u00E7\u00F2?\u00E9 M\u00EAss\u00E0g\u00EB
[echoproperties] common.app.PasswordText=P\u00E0ssw\u00F2r\u00F0
[echoproperties] common.app.ServerName=http\://V\u00E8r\u00ECv\u00F2.com
[echoproperties] common.app.UserIdText=\u00D9s\u00E8r \u00CC\u00D0
[echoproperties] common.app.description=Pseudo Loc Test App for Build Script testing
[echoproperties] common.app.name=??
[echoproperties] file.encoding=Cp1252
[echoproperties] file.encoding.pkg=sun.io
[echoproperties] file.separator=\\
[echoproperties] ios.RegistrationText=R\u00E8g\u00ECstr\u00E0t\u00ED\u00F2\u00F1 T\u00E9xt
[echoproperties] ios.RegistrationURL=http\://www.josscrowcroft.com/2011/code/utf-8-multibyte-characters-in-url-parameters-%E2%9C%93/
[echoproperties] java.awt.graphicsenv=sun.awt.Win32GraphicsEnvironment
[echoproperties] java.awt.printerjob=sun.awt.windows.WPrinterJob
[echoproperties] java.class.path=c\:\\ant\\bin\\..\\lib\\ant-launcher.jar;C\:\\Temp\\utf8\\.\\;C\:\\Program Files (x86)\\Java\\jre7\\lib\\ext\\QTJava.zip;C\:\\ant\\lib\\ant-antlr.jar;C\:\\ant\\lib\\ant-apache-bcel.jar;C\:\\ant\\lib\\ant-apache-bsf.jar;C\:\\ant\\lib\\ant-apache-log4j.jar;C\:\\ant\\lib\\ant-apache-oro.jar;C\:\\ant\\lib\\ant-apache-regexp.jar;C\:\\ant\\lib\\ant-apache-resolver.jar;C\:\\ant\\lib\\ant-apache-xalan2.jar;C\:\\ant\\lib\\ant-commons-logging.jar;C\:\\ant\\lib\\ant-commons-net.jar;C\:\\ant\\lib\\ant-contrib-1.0b3.jar;C\:\\ant\\lib\\ant-jai.jar;C\:\\ant\\lib\\ant-javamail.jar;C\:\\ant\\lib\\ant-jdepend.jar;C\:\\ant\\lib\\ant-jmf.jar;C\:\\ant\\lib\\ant-jsch.jar;C\:\\ant\\lib\\ant-junit.jar;C\:\\ant\\lib\\ant-launcher.jar;C\:\\ant\\lib\\ant-netrexx.jar;C\:\\ant\\lib\\ant-nodeps.jar;C\:\\ant\\lib\\ant-starteam.jar;C\:\\ant\\lib\\ant-stylebook.jar;C\:\\ant\\lib\\ant-swing.jar;C\:\\ant\\lib\\ant-testutil.jar;C\:\\ant\\lib\\ant-trax.jar;C\:\\ant\\lib\\ant-weblogic.jar;C\:\\ant\\lib\\ant.jar;C\:\\ant\\lib\\bb-ant-tools.jar;C\:\\ant\\lib\\xercesImpl.jar;C\:\\ant\\lib\\xml-apis.jar;C\:\\Program Files\\Java\\jre7\\lib\\tools.jar
[echoproperties] java.class.version=51.0
[echoproperties] java.endorsed.dirs=C\:\\Program Files\\Java\\jre7\\lib\\endorsed
[echoproperties] java.ext.dirs=C\:\\Program Files\\Java\\jre7\\lib\\ext;C\:\\Windows\\Sun\\Java\\lib\\ext
[echoproperties] java.home=C\:\\Program Files\\Java\\jre7
[echoproperties] java.io.tmpdir=C\:\\Users\\efelton\\AppData\\Local\\Temp\\
[echoproperties] java.library.path=C\:\\Windows\\SYSTEM32;C\:\\Windows\\Sun\\Java\\bin;C\:\\Windows\\system32;C\:\\Windows;C\:\\Windows\\SYSTEM32;C\:\\Windows;C\:\\Windows\\SYSTEM32\\WBEM;C\:\\Windows\\SYSTEM32\\WINDOWSPOWERSHELL\\V1.0\\;C\:\\PROGRAM FILES\\INTEL\\WIFI\\BIN\\;C\:\\PROGRAM FILES\\COMMON FILES\\INTEL\\WIRELESSCOMMON\\;C\:\\PROGRAM FILES (X86)\\MICROSOFT SQL SERVER\\100\\TOOLS\\BINN\\;C\:\\PROGRAM FILES\\MICROSOFT SQL SERVER\\100\\TOOLS\\BINN\\;C\:\\PROGRAM FILES\\MICROSOFT SQL SERVER\\100\\DTS\\BINN\\;C\:\\PROGRAM FILES (X86)\\MICROSOFT SQL SERVER\\100\\TOOLS\\BINN\\VSSHELL\\COMMON7\\IDE\\;C\:\\PROGRAM FILES (X86)\\MICROSOFT SQL SERVER\\100\\DTS\\BINN\\;C\:\\Program Files\\ThinkPad\\Bluetooth Software\\;C\:\\Program Files\\ThinkPad\\Bluetooth Software\\syswow64;C\:\\Program Files (x86)\\QuickTime\\QTSystem\\;C\:\\Program Files (x86)\\AccuRev\\bin;C\:\\Program Files\\Java\\jdk1.7.0_04\\bin;C\:\\Program Files (x86)\\IDM Computer Solutions\\UltraEdit\\;.
[echoproperties] java.runtime.name=Java(TM) SE Runtime Environment
[echoproperties] java.runtime.version=1.7.0_04-b22
[echoproperties] java.specification.name=Java Platform API Specification
[echoproperties] java.specification.vendor=Oracle Corporation
[echoproperties] java.specification.version=1.7
[echoproperties] java.vendor=Oracle Corporation
[echoproperties] java.vendor.url=http\://java.oracle.com/
[echoproperties] java.vendor.url.bug=http\://bugreport.sun.com/bugreport/
[echoproperties] java.version=1.7.0_04
[echoproperties] java.vm.info=mixed mode
[echoproperties] java.vm.name=Java HotSpot(TM) 64-Bit Server VM
[echoproperties] java.vm.specification.name=Java Virtual Machine Specification
[echoproperties] java.vm.specification.vendor=Oracle Corporation
[echoproperties] java.vm.specification.version=1.7
[echoproperties] java.vm.vendor=Oracle Corporation
[echoproperties] java.vm.version=23.0-b21
[echoproperties] line.separator=\r\n
[echoproperties] os.arch=amd64
[echoproperties] os.name=Windows 7
[echoproperties] os.version=6.1
[echoproperties] path.separator=;
[echoproperties] sun.arch.data.model=64
[echoproperties] sun.boot.class.path=C\:\\Program Files\\Java\\jre7\\lib\\resources.jar;C\:\\Program Files\\Java\\jre7\\lib\\rt.jar;C\:\\Program Files\\Java\\jre7\\lib\\sunrsasign.jar;C\:\\Program Files\\Java\\jre7\\lib\\jsse.jar;C\:\\Program Files\\Java\\jre7\\lib\\jce.jar;C\:\\Program Files\\Java\\jre7\\lib\\charsets.jar;C\:\\Program Files\\Java\\jre7\\lib\\jfr.jar;C\:\\Program Files\\Java\\jre7\\classes
[echoproperties] sun.boot.library.path=C\:\\Program Files\\Java\\jre7\\bin
[echoproperties] sun.cpu.endian=little
[echoproperties] sun.cpu.isalist=amd64
[echoproperties] sun.desktop=windows
[echoproperties] sun.io.unicode.encoding=UnicodeLittle
[echoproperties] sun.java.command=org.apache.tools.ant.launch.Launcher -cp .;C\:\\Program Files (x86)\\Java\\jre7\\lib\\ext\\QTJava.zip
[echoproperties] sun.java.launcher=SUN_STANDARD
[echoproperties] sun.jnu.encoding=Cp1252
[echoproperties] sun.management.compiler=HotSpot 64-Bit Tiered Compilers
[echoproperties] sun.os.patch.level=Service Pack 1
[echoproperties] user.country=US
[echoproperties] user.dir=C\:\\Temp\\utf8
[echoproperties] user.home=C\:\\Users\\efelton
[echoproperties] user.language=en
[echoproperties] user.name=efelton
[echoproperties] user.script=
[echoproperties] user.timezone=
[echoproperties] user.variant=
BUILD SUCCESSFUL
Total time: 1 second
Thank you for your help
EDIT\UPDATE 6/19/2012
I am developing in a Windows environment.
I have installed a TTF from:
http://freedesktop.org/wiki/Software/CJKUnifonts/Download
I have updated UltraEdit to use the TTF and I can see the Chinese characters.
<?xml version="1.0" encoding="UTF-8" ?>
<project name="utf8test" default="all" basedir=".">
<target name="all">
<echo>??</echo>
<echo encoding="ISO-8859-1">ISO-8859-1 ??</echo>
<echo encoding="UTF-8">UTF-8 ??</echo>
<echo file="echo_output.txt" append="true" >?? ${line.separator}</echo>
<echo file="echo_output.txt" append="true" encoding="ISO-8859-1">ISO-8859-1 ?? ${line.separator}</echo>
<echo file="echo_output.txt" append="true" encoding="UTF-8">UTF-8 ?? ${line.separator}</echo>
<echo file="echo_output.txt" append="true" encoding="UnicodeLittle">UnicodeLittle ?? ${line.separator}</echo>
<echo file="echo_output.txt" append="true" encoding="UnicodeLittleUnmarked">UnicodeLittleUnmarked ?? ${line.separator}</echo>
</target>
</project>
The output captured by running inside UltraEdit is:
Buildfile: E:\temp\utf8\build.xml
all:
[echo] ??
[echo] ISO-8859-1 ??
[echo] UTF-8 ??
BUILD SUCCESSFUL
Total time: 1 second
And the echo_output.txt file shows up like this:
??
ISO-8859-1 ??
UTF-8 ??
ÿþU n i c o d e L i t t l e ? ?
U n i c o d e L i t t l e U n m a r k e d ? ?
So there appears to be somehting fundamentally wrong with how my ANT environment is set up since I cannot simply echo the character to the screen or to a file.