unicode normalization - Page 16

Excel transpose via paste

- by David Oneill

I want to transpose data in Excel. Normally, I cut the cells I need, and use paste special - transpose. However, sometimes when I do paste special, a box comes up asking me if I want to use unicode text vs normal text. How do I transpose this text? Is there a way to get past the unicode dialog box and get to the normal Paste special dialog box (that has the 'transpose' option)? Or, is there another simple way to transpose cells? transpose = flip rows and columns IE 1, 2, 3 becomes: 1 2 3

Read the article

How do I fix font corruption in Google Chrome 9.0.597.44beta in Windows XP?

- by snicker

I am not sure what is causing this problem, but I think it is related to unicode problems. Google Chrome, seemingly out of nowhere a month ago, stopped rendering unicode characters in certain fonts. IE this ?_? Looks fine in some fonts, but looks like this in others. Renders fine in other browsers. Most recently, I visited the FourSquare website and have complete font corruption. Here is IE vs Chrome Full Size What gives? Has anyone else seen this? How can I fix it?

Read the article

Converting an AnsiString to a Unicode String

- by jrodenhi

I'm converting a D2006 program to D2010. I have a value stored in a single byte per character string in my database and I need to load it into a control that has a LoadFromStream, so my plan was to write the string to a stream and use that with LoadFromStream. But it did not work. In studying the problem, I see an issue that tells me that I don't really understand how conversion from AnsiString to Unicode string works. Here is some code I am puzzling over: oStringStream := TStringStream.Create(sBuffer); sUnicodeStream := oPayGrid.sStream; //explicit conversion to unicode string iSize1 := StringElementSize(oPaygrid.sStream); iSize2 := StringElementSize(sUnicodeStream); oStringStream.WriteString(sUnicodeStream); When I get to the last line, iSize1 does equal 1 and iSize2 does equal 2, so that part is what I understood from my reading. But, on the last line, after I write the string to the stream, and look at the Bytes Property of the string, it shows this (the string starts as '16,159'): (49 {$31}, 54 {$36}, 44 {$2C}, 49 {$31}, 53 {$35}, 57 {$39} ... I was expecting that it might look something like (49 {$31}, 00 {$00}, 54 {$36}, 00 {$00}, 44 {$2C}, 00 {$00}, 49 {$31}, 00 {$00}, 53 {$35}, 00 {$00}, 57 {$39}, 00 {$00} ... I'm not getting the right results out of the LoadFromStream because it is reading from the stream two bytes at a time, but the data it is receiving is not arranged that way. What is it that I should do to give the LoadFromStream a well formed stream of data based on a unicode string? Thank you for your help.

Read the article

Why do Unicode characters show up properly in database, but as ? when printed in Java via Hibernate?

- by lupefiasco

I'm writing a webapp, and interfacing with MySQL using Hibernate 3.5. Using "?????? ?????????" as my test string, I can input the string and see that it is properly persisted into the database. However, when I later pull the value out of the database and print to the console as a String, I see "?????? ?????????". If I use new OutputStreamWriter(System.out,"UTF-8"); then I get "„Éá„Çp„ÇØ„Éà„ÉÉ„Éó ·Éò·Éú·Éí·Éö·Éò·É°·É£·É†·Éò"". Why don't I see the original string? These are my hibernate.cfg.xml settings: <property name="hibernate.connection.useUnicode"> true </property> <property name="hibernate.connection.characterEncoding"> UTF-8 </property> <property name="hibernate.connection.charSet"> UTF-8 </property> and this is my database connection string: hibernate.connection.url = jdbc:mysql://localhost/mydatabase?autoReconnect=true&useUnicode=true&characterEncoding=UTF-8

Read the article

Should UTF-16 be considered harmful?

- by Artyom

I'm going to ask what is probably quite a controversial question: "Should one of the most popular encodings, UTF-16, be considered harmful?" Why do I ask this question? How many programmers are aware of the fact that UTF-16 is actually a variable length encoding? By this I mean that there are code points that, represented as surrogate pairs, take more then one element. I know; lots of applications, frameworks and APIs use UTF-16, such as Java's String, C#'s String, Win32 APIs, Qt GUI libraries, the ICU Unicode library, etc. However, with all of that, there are lots of basic bugs in the processing of characters out of BMP (characters that should be encoded using two UTF-16 elements). For example, try to edit one of these characters: 𝄞 𝕥 𝟶 𠂊 You may miss some, depending on what fonts you have installed. These characters are all outside of the BMP (Basic Multilingual Plane). If you cannot see these characters, you can also try looking at them in the Unicode Character reference. For example, try to create file names in Windows that include these characters; try to delete these characters with a "backspace" to see how they behave in different applications that use UTF-16. I did some tests and the results are quite bad: Opera has problem with editing them Notepad can't deal with them correctly (delete for example) File names editing in Window dialogs in broken All QT3 applications can't deal with them. StackOverflow seems to remove these characters if edited directly in as Unicode characters, and only seems to allow them as HTML Unicode escapes. So... This was very simple test. Do you think that UTF-16 should be considered harmful?

Read the article

PHP PCRE differences on testing and hosting servers

- by Gary Pearman

Hi all, I've got the following regular expression that works fine on my testing server, but just returns an empty string on my hosted server. $text = preg_replace('~[^\\pL\d]+~u', $use, $text); Now I'm pretty sure this comes down to the hosting server version of PCRE not being compiled with Unicode property support enabled. The differences in the two versions are as follows: My server: PCRE version 7.8 2008-09-05 Compiled with UTF-8 support Unicode properties support Newline sequence is LF \R matches all Unicode newlines Internal link size = 2 POSIX malloc threshold = 10 Default match limit = 10000000 Default recursion depth limit = 10000000 Match recursion uses stack Hosting server: PCRE version 4.5 01-December-2003 Compiled with UTF-8 support Newline character is LF Internal link size = 2 POSIX malloc threshold = 10 Default match limit = 10000000 Match recursion uses stack Also note that the version on the hosting server (the same version PHP is compiled against) is pretty old. What confuses me though, is that pcretest fails on both servers from the command line with re> ~[^\\pL\d]+~u ** Unknown option 'u' although this regexp works fine when run from PHP on my server. So, I guess my questions are does the regular expression fail on the hosting server because of the lack of Unicode properties? Or is there something else that I'm missing? Thanks all, Gaz.

Read the article

SHGetFolderPath returns path with question marks in it

- by Colen

Hi, Our application calls ShGetFolderPath when it runs, to get the My Documents folder. This normally works great. However, for three users - ???????, Jörg and Jörgen (see if you can spot the pattern!) - the call returns some very strange results. For example, for ???????, the call returns: c:\Users\???????\Documents I assume there's some sort of character encoding shenanigan going on here, possibly related to Unicode, but I don't have any experience with that sort of thing. How can I get a useful path to the folder (and other related folders) out of windows, without grovelling through registry keys for the information? In an email to me, ??????? ("Dmitry"), told me his "my documents" folder was actually located here: C:\Users\43D6~1\Documents So I know there's a way to get a "normal" version of the path out of Windows, I just don't know what it is. Background: Our application is not unicode-aware, and uses standard "char *" strings. How can we get the "normal" path? I'm not opposed to calling the "unicode" version of the function, then converting it to "normal" text, if that's possible. Converting the application entirely to use unicode is not an option here (we don't have the time). Thanks.

Read the article

What is the correct JNA mapping for UniChar on Mac OS X?

- by Trejkaz

I have a C struct like this: struct HFSUniStr255 { UInt16 length; UniChar unicode[255]; }; I have mapped this in the expected way: public class HFSUniStr255 extends Structure { public UInt16 length; // UInt16 is just an IntegerType with length 2 for convenience. public /*UniChar*/ char[] unicode = new char[255]; //public /*UniChar*/ byte[] unicode = new byte[255*2]; //public /*UniChar*/ UInt16[] unicode = new UInt16[255]; public HFSUniStr255() { } public HFSUniStr255(Pointer pointer) { super(pointer); } } If I use this version, I get every second character of the string into my char[] ("aits D" for "Macintosh HD".) I am assuming that this is something to do with being on a 64-bit platform and JNA mapping the value to a 32-bit wchar_t but then chopping off the high 16 bits on each wchar_t on copying them back. If I use the byte[] version, I get data which decodes correctly using the UTF-16LE charset. If I use the UInt16[] version, I get the right code point for each character but it is then inconvenient to convert them back into a string. Is there some way I can define my type as char[], and yet have it convert correctly?

Read the article

php turn unicode http link into clickable

- by newinjs

Hello, i have unicode link needed to be turn into links. Is it possible to change unicode into clickable? Currently i'm using this piece of code to turn link into clickable function clickable_link($text) { $ret = ' ' . $text; $ret = preg_replace("#(^|[\n ])([\w]+?://[\w\#$%&~/.\-;:=,?@\[\]+]*)#is", "\\1<a class=\"hrefLink\" href=\"\\2\" target=\"_blank\">\\2</a>", $ret); $ret = preg_replace("#(^|[\n ])((www|ftp)\.[\w\#$%&~/.\-;:=,?@\[\]+]*)#is", "\\1<a class=\"hrefLink\" href=\"http://\\2\" target=\"_blank\">\\2</a>", $ret); $ret = preg_replace("#(^|[\n ])([a-z0-9&\-_.]+?)@([\w\-]+\.([\w\-\.]+\.)*[\w]+)#i", "\\1<a href=\"mailto:\\2@\\3\">\\2@\\3</a>", $ret); $ret = substr($ret, 1); return $ret; } Any help would be deeply appreciated.

Read the article

how to convert japanese characters to unicode?

- by TopCoder

can you point me tool to convert japanese characters to unicode ?

Read the article

How to convert from unicode to ASCII

- by Hanny

Is there any way to convert unicode values to ASCII?

Read the article

Postgres 8.1 pg_trgm plugin doesn’t work with unicode chars

- by Konsi

Is it possible to get the pg_trgm plugin in postgres server 8.1 to work with unicode chars?

Read the article

Cisco FWSM -> ASA upgrade broke our mail server

- by Mike Pennington

We send mail with unicode asian characters to our mail server on the other side of our WAN... immediately after upgrading from a FWSM running 2.3(2) to an ASA5550 running 8.2(5), we saw failures on mail jobs that contained unicode. The symptoms are pretty clear... using the ASA's packet capture utility, we snagged the traffic before and after it left the ASA... access-list PCAP line 1 extended permit tcp any host 192.0.2.25 eq 25 capture pcap_inside type raw-data access-list PCAP buffer 1500000 packet-length 9216 interface inside capture pcap_outside type raw-data access-list PCAP buffer 1500000 packet-length 9216 interface WAN I downloaded the pcaps from the ASA by going to https://<fw_addr>/pcap_inside/pcap and https://<fw_addr>/pcap_outside/pcap... when I looked at them with Wireshark Follow TCP Stream, the inside traffic going into the ASA looks like this EHLO metabike AUTH LOGIN YzFwbUlciXNlck== cZUplCVyXzRw But the same mail leaving the ASA on the outside interface looks like this... EHLO metabike AUTH LOGIN YzFwbUlciXNlck== XXXXXXXXXXXX The XXXX characters are concerning... I fixed the issue by disabling ESMTP inspection: wan-fw1(config)# policy-map global_policy wan-fw1(config-pmap)# class inspection_default wan-fw1(config-pmap-c)# no inspect esmtp wan-fw1(config-pmap-c)# end The $5 question... our old FWSM used SMTP fixup without issues... mail went down at the exact moment that we brought the new ASAs online... what specifically is different about the ASA that it is now breaking this mail? Note: usernames / passwords / app names were changed... don't bother trying to Base64-decode this text.

Read the article

Python encoding for pipe.communicate

- by Brian M. Hunt

I'm calling pipe.communicate from Python's subprocess module from Python 2.6. I get the following error from this code: from subprocess import Popen pipe = Popen(cwd) pipe.communicate( data ) For an arbitrary cwd, and where data that contains unicode (specifically 0xE9): Exec. exception: 'ascii' codec can't encode character u'\xe9' in position 507: ordinal not in range(128) Traceback (most recent call last): ... stdout, stderr = pipe.communicate( data ) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py", line 671, in communicate return self._communicate(input) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py", line 1177, in _communicate bytes_written = os.write(self.stdin.fileno(), chunk) This is happening, I presume, because pipe.communicate() is expecting ASCII encoded string, but data is unicode. Is this the problem I'm encountering, and i sthere a way to pass unicode to pipe.communicate()? Thank you for reading! Brian

Read the article

Load JSON in Python as header character set

- by mridang

Hi everyone, I've always found character sets and encodings complicated to understand and here I'm faced with another problem. My apologies for any inaccuracies. I'll do my best. I'm requesting data from a server which returns JSON. In the HTTP headers it also returns the character set like so: Content-Type: text/html; charset=UTF-8 I'm using the JSON library in Python to load the JSON using the json.loads method. When I pass it the returned JSON, it gives me a dictionary in Unicode. I've Googled around and I know that JSON should return Unicode as JavaScript strings are Unicode objects. How can I load the JSON as UTF-8? I would like to use the same encoding as specified in the response header. I've read this post but it didn't help. Thank you.

Read the article

Load JSON in Python as header chracterset

- by mridang

Hi everyone, I've always found character-sets and encodings complicated to understand and here I'm faced with another problem. My apologies for any inaccuracies. I'll do my best. I'm requesting data from a server which returns JSON. In the HTTP headers it also returns the character.set like so: Content-Type: text/html; charset=UTF-8 I'm using the JSON library in python to load the JSON using the json.loads method. When I pass it the returned JSON, it gives me a dictionary in Unicode. I've Googled around and I know that JSON should return Unicode as JavaScript strings are Unicode objects. How can I load the JSON as UTF-8. I would like to use the same encoding as specified in the response header. I've read this post but it didn't help. Thank you.

Read the article

How would you create a string of all UTF-8 characters? [PHP]

- by Xeoncross

There are many ways to represent the +1 million UTF-8 characters. Take the latin capital "A" with macron (A). This is unicode code point U+0100, hex number 0xc4 0x80, decimal number 196 128, and binary 11000100 10000000. I would like to create a collection of the first 65,535 UTF-8 characters for use in testing applications. These are all unicode characters up to code point U+FFFF (byte3). Is it possible to do something like a for($x=0) loop and then convert the resulting decimal to another base (like hex) which would allow the creation of the matching unicode character? I can create the value A using something like this: $char = "\xc4\x80"; // or $char = chr(196).chr(128); However, I am not sure how to turn this into an automated process. // fail! $char = "\x". dechex($a). "\x". dexhex($$b);

Read the article

Is Embed Resource a good approach for a read only xml database?

- by Nasser Hajloo

I have an open source application (here) This application get a character or a sentence and give some unicode information about it. Iuse Unicode Character Database which provided by Unicode.org this is a XML document (130MB) At first I embed this XML to my DLL but I don't know is it a good approach or no. because DLL size growth just because of this XML document. I can use it like any other resources but usercan see it. What Should I do? What is the best pattern for this? and Why ? TIA

Read the article

How do I read UTF-8 characters via a pointer?

- by Jen

Suppose I have UTF-8 content stored in memory, how do I read the characters using a pointer? I presume I need to watch for the 8th bit indicating a multi-byte character, but how exactly do I turn the sequence into a valid Unicode character? Also, is wchar_t the proper type to store a single Unicode character? This is what I have in mind: wchar_t readNextChar (char** p) { char ch = *p++; if (ch & 128) { // This is a multi-byte character, what do I do now? // char chNext = *p++; // ... but how do I assemble the Unicode character? ... } ... }

Read the article

Mouse wheel scrolling in less and vim using urxvt

- by Adam Batkin

I have started working with rxvt-unicode (aka urxvt) but found an issue with mouse-wheel scrolling, as compared to gnome-terminal and konsole. The mouse wheel works fine for going through the scrollback buffer, but it doesn't work for automatic scrolling in less/most or vim (though in vim, setting mouse=a makes it work, but in a very different way, which I don't have to do with gnome-terminal/konsole). Is there a way to make urxvt behave like gnome-terminal and konsole when in less and vim where the mouse wheel Just Works?

Read the article

Word Opening Text Files by default as RTL, should be LTR

- by wonea

I've noticed recently that Microsoft Word 2007 on Windows XP is opening by default straight text files as RTL instead of LTR. These are English written files, and contain no Unicode or characters other than ASCII. I work in three or four languages on the computer so have the language bar open, this problem happens when the language is selected to EN (United Kingdom). Is there a setting I'm missing somewhere, perhaps in Word itself?

Read the article

Mathematica regular expressions on unicode strings.

- by dreeves

This was a fascinating debugging experience. Can you spot the difference between the following two lines? StringReplace["–", RegularExpression@"[\\s\\S]" -> "abc"] StringReplace["-", RegularExpression@"[\\s\\S]" -> "abc"] They do very different things when you evaluate them. It turns out it's because the string being replaced in the first line consists of a unicode en dash, as opposed to a plain old ascii dash in the second line. In the case of the unicode string, the regular expression doesn't match. I meant the regex "[\s\S]" to mean "match any character (including newline)" but Mathematica apparently treats it as "match any ascii character". How can I fix the regular expression so the first line above evaluates the same as the second? Alternatively, is there an asciify filter I can apply to the strings first? PS: The Mathematica documentation says that its string pattern matching is built on top of the Perl-Compatible Regular Expressions library (http://pcre.org) so the problem I'm having may not be specific to Mathematica.

Read the article

Opening a Unicode file with Perl

- by Jaco Pretorius

I'm using osql to run several sql scripts against a database and then I need to look at the results file to check if any errors occurred. The problem is that perl doesn't seem to like the fact that the results files are unicode. I wrote a little test script to test it and the output comes out all warbled. $file = shift; open OUTPUT, $file or die "Can't open $file: $!\n"; while (<OUTPUT>) { print $_; if (/Invalid|invalid|Cannot|cannot/) { push(@invalids, $file); print "invalid file - $inputfile - schedule for retry\n"; last; } } Any ideas? I've tried decoding using decode_utf8 but it makes no difference. I've also tried to set the encoding when opening the file. I think the problem might be that osql puts the result file in UTF-16 format, but I'm not sure. When I open the file in textpad it just tells me 'Unicode'. Edit: Using perl v5.8.8

Read the article

Orbited exception Data must not be unicode.

- by Sid

I am working with orbited and once I switch on orbited in production mode it throws the following error on my screen -- <exception caught here> --- File "/usr/lib/python2.6/dist-packages/twisted/web/server.py", line 150, in process self.render(resrc) File "/usr/lib/python2.6/dist-packages/twisted/web/server.py", line 157, in render body = resrc.render(self) File "/usr/local/lib/python2.6/dist-packages/orbited-0.7.10-py2.6.egg/orbited/transports/base.py", line 21, in render self.conn.transportOpened(self) File "/usr/local/lib/python2.6/dist-packages/orbited-0.7.10-py2.6.egg/orbited/cometsession.py", line 322, in transportOpened self.cometTransport.flush() File "/usr/local/lib/python2.6/dist-packages/orbited-0.7.10-py2.6.egg/orbited/transports/base.py", line 45, in flush self.write(self.packets) File "/usr/local/lib/python2.6/dist-packages/orbited-0.7.10-py2.6.egg/orbited/transports/htmlfile.py", line 42, in write self.request.write(payload); File "/usr/lib/python2.6/dist-packages/twisted/web/http.py", line 862, in write self.transport.write(data) File "/usr/lib/python2.6/dist-packages/twisted/internet/tcp.py", line 420, in write abstract.FileDescriptor.write(self, bytes) File "/usr/lib/python2.6/dist-packages/twisted/internet/abstract.py", line 170, in write raise TypeError("Data must not be unicode") exceptions.TypeError: Data must not be unicode I have absolutely no clue as to what could be the problem. Could anyone point me in the right direction.

Read the article

How do you get Matlab to write the BOM (byte order markers) for UTF-16 text files?

- by Richard Povinelli

I am creating UTF16 text files with Matlab, which I am later reading in using Java. In Matlab, I open a file called fileName and write to it as follows: fid = fopen(fileName, 'w','n','UTF16-LE'); fprintf(fid,"Some stuff."); In Java, I can read the text file using the following code: FileInputStream fileInputStream = new FileInputStream(fileName); Scanner scanner = new Scanner(fileInputStream, "UTF-16LE"); String s = scanner.nextLine(); Here is the hex output: Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 00000000 73 00 6F 00 6D 00 65 00 20 00 73 00 74 00 75 00 66 00 66 00 s.o.m.e. .s.t.u.f.f. The above approach works fine. But, I want to be able to write out the file using UTF16 with a BOM to give me more flexibility so that I don't have to worry about big or little endian. In Matlab, I've coded: fid = fopen(fileName, 'w','n','UTF16'); fprintf(fid,"Some stuff."); In Java, I change the code to: FileInputStream fileInputStream = new FileInputStream(fileName); Scanner scanner = new Scanner(fileInputStream, "UTF-16"); String s = scanner.nextLine(); In this case, the string s is garbled, because Matlab is not writing the BOM. I can get the Java code to work just fine if I add the BOM manually. With the added BOM, the following file works fine. Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 00000000 FF FE 73 00 6F 00 6D 00 65 00 20 00 73 00 74 00 75 00 66 00 66 00 ÿþs.o.m.e. .s.t.u.f.f. How can I get Matlab to write out the BOM? I know I could write the BOM out separately, but I'd rather have Matlab do it automatically. Addendum I selected the answer below from Amro because it exactly solves the question I posed. One key discovery for me was the difference between the Unicode Standard and a UTF (Unicode transformation format) (see http://unicode.org/faq/utf_bom.html). The Unicode Standard provides unique identifiers (code points) for characters. UTFs provide mappings of every code point "to a unique byte sequence." Since all but a handful of the characters I am using are in the first 128 code points, I'm going to switch to using UTF-8 as Romeo suggests. UTF-8 is supported by Matlab (The warning shown below won't need to be suppressed.) and Java, and for my application will generate smaller text files. I suppress the Matlab warning Warning: The encoding 'UTF-16LE' is not supported. with warning off MATLAB:iofun:UnsupportedEncoding;

Search Results

Search found 1649 results on 66 pages for 'unicode normalization'.

Page 16/66 | < Previous Page | 12 13 14 15 16 17 18 19 20 21 22 23 | Next Page >

- by David Oneill

- by snicker

- by jrodenhi

- by lupefiasco

- by Artyom

- by Gary Pearman

- by Colen

- by Trejkaz

- by newinjs

- by TopCoder

- by Hanny

- by Konsi

- by Mike Pennington

- by Brian M. Hunt

- by mridang

- by mridang

- by Xeoncross

- by Nasser Hajloo

- by Jen

- by Adam Batkin

- by wonea

- by dreeves

- by Jaco Pretorius

- by Sid

- by Richard Povinelli

< Previous Page | 12 13 14 15 16 17 18 19 20 21 22 23 | Next Page >