unicode normalization - Page 26

Display WCHAR Strings in Xcode Debugger

- by Nicholaz

I'd like to preview WCHAR strings in the variable display of the Xcode 3.2 debugger. Bascially if I have WCHAR wtext[128]; wcscpy(wtext, L"Hello World"); I'd like to see "Hello World" for wtext when tracing into the function.

Read the article

String searching algorithm for Chinese characters.

- by Jack Low

There are Python code available for existing algorithms for normal string searching e.g. Boyer-Moore Algorithm. I am looking to use this on Chinese characters and it doesn't seem like the same implementation would work. What would I go about doing in order to make the algorithm work on Chinese characters? I am referring to this: http://en.literateprograms.org/Boyer-Moore_string_search_algorithm_(Python)#References

Read the article

PHP: Cyrillic characters not displayed correctly

- by user295502

Recently I switched hosting from one provider to the other and I have problems displaying Cyrillic characters. The characters which are read from the database are displayed correctly, but characters which are hardcoded in the php file aren't (they are displayed as question marks). The files which contain the php source code are saved in utf-8 form. Help anybody?

Read the article

Amazon SQS invalid binary character in message body

- by letronje

I have a web app that sends messages to an Amazon SQS Queue. Amazon sqs lib throws a 'AmazonSQSException' since the message contained invalid binary character. The message is the referrer obtained from an incoming http request. This is what it looks like: http://ads.vrx.adbrite.com/adserver/display_iab_ads.php?sid=1220459&title_color=0000FF&text_color=000000&background_color=FFFFFF&border_color=CCCCCC&url_color=008000&newwin=0&zs=3330305f323530&width=300&height=250&url=http%3A%2F%2Funblockorkutproxy.com%2Fsearch.php%2FOi8vZG93%2FbmxvYWRz%2FLnppZGR1%2FLmNvbS9k%2Fb3dubG9h%2FZGZpbGUv%2FNTY5MTQ3%2FNi9NeUN1%2FdGVHaXJs%2FZnJpZW5k%2FWmFoaXJh%2FLndtdi5o%2FdG1s%2Fb0%2F^FÃ´}ÃºÃ<99Ã«)j Looks like the characters in bold are the invalid characters. Is there an easy way to filter out characters characters that are not accepted by amazon ? Here are the characters allowed by amazon in message body. I am not sure what regex i should use to replace invalid characters by ''

Read the article

Python.expat can't parse XML file with bad symbols. How to go around?

- by culebrón

I'm trying to parse an XML file with expat, and here's the line where I get bad token exception: <tag k="name" v="???????????????????????????????????????????????????????????????????" /> xml.parsers.expat.ExpatError: not well-formed (invalid token): line 610127, column 37 The symbols in hex look like: \xd1? Seems like someone wrote this string (Russian alfabet) hitting backspace a few times. I set parser.returns_unicode = True, but this didn't help. The 1st line is <?xml version="1.0" encoding="UTF-8"?>. I work with a bz2 file. (bz2.BZ2File) How can I parse the file?

Read the article

A UnicodeDecodeError that occurs with json in python on Windows, but not Mac.

- by ventolin

On windows, I have the following problem: >>> string = "Don´t Forget To Breathe" >>> import json,os,codecs >>> f = codecs.open("C:\\temp.txt","w","UTF-8") >>> json.dump(string,f) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python26\lib\json\__init__.py", line 180, in dump for chunk in iterable: File "C:\Python26\lib\json\encoder.py", line 294, in _iterencode yield encoder(o) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-5: invalid data (Notice the non-ascii apostrophe in the string.) However, my friend, on his mac (also using python2.6), can run through this like a breeze: > string = "Don´t Forget To Breathe" > import json,os,codecs > f = codecs.open("/tmp/temp.txt","w","UTF-8") > json.dump(string,f) > f.close(); open('/tmp/temp.txt').read() '"Don\\u00b4t Forget To Breathe"' Why is this? I've also tried using UTF-16 and UTF-32 with json and codecs, but to no avail.

Read the article

Clickatell API - special characters

- by pepernik

What do I have to do to receive SMS with special characters like dšžcc. Example url: http://api.clickatell.com/http/sendmsg?session_id=XXXXXXX&text=My characters dšžcc&to=XXXXXXXXX Urlencode does not help. Thx.

Read the article

How can I use ToUnicode without breaking dead key support?

- by Cypherjb

A similar question has already been asked, so I'm not going to waste time re-explaining it, an existing discussion can be found here: http://stackoverflow.com/questions/1964614/toascii-tounicode-in-a-keyboard-hook-destroys-dead-keys The reason I'm posting a new question however is that I seem to have come across a 'solution', but I'm not quite sure how to implement it. This blog post seems to propose a solution to the problem of ToUnicode killing dead-key support: http://blogs.msdn.com/michkap/archive/2005/01/19/355870.aspx However I'm not sure how to implement the suggested solution. A push in the right direction would be greatly appreciated. To be clear, the part I'm referring to is this: "There are two ways to work around this: 1) You can keep calling ToUnicode with the same info until it is cleared out and then call it one more time to put the state back where it was if you had never typed anything, or 2) You can load all of the keyboard info ahead of time and then when they type information you can look up in your own info cache what the keystrokes mean, without having to call APIs later." I'm not quite sure how to do either of those things (keyboards and internationalization are far from my strong point), so any help would be greatly appreciated. Thanks

Read the article

When uploading Arabic files in Spring, filename ends up with XML entities instead of Arabic glyphs

- by sword101

I am using Spring upload to upload files. When uploading an Arabic file and getting the original file name in the controller, I get something like: المغفلين.png I expect it to be: ????????.png Any ideas why this problem occur?

Read the article

python read utf8 text file problem

- by cpps

I have a problem with python about reading and print utf8 text file. I have a test.txt in utf8 encoding without BOM. This file has two characters in it: ?? The first character "?" is Chinese and the second "?" is Japanese. Now, When I use Ulipad (a python editor) to run the following code to read the txt file, and print these two characters. import codecs infile = "C:\\test.txt" f = codecs.open(infile, "r", "utf-8") s = f.read() print(s) I got this error, "UnicodeEncodeError: 'cp950' codec can't encode character '\u58f0' in position 1: illegal multibyte sequence" I found it caused from the second character "?" . But when I use the same code to test in python default GUI IDLE, it works to print the two characters with no error. So, how can I fix the problem. My running environment is python 3.1 , windows xp traditional Chinese.

Read the article

"É" not getting converted to two bytes correctly.

- by ChrisF

Further to this question I've got a supplementary problem. I've found a track with an "É" in the title. My code: var playList = new StreamWriter(playlist, false, Encoding.UTF8); - private static void WriteUTF8(StreamWriter playList, string output) { byte[] byteArray = Encoding.UTF8.GetBytes(output); foreach (byte b in byteArray) { playList.Write(Convert.ToChar(b)); } } converts this to the following bytes: 195 137 which is being output as Ã followed by a square (which is an character that can't be printed in the current font). I've exported the same file to a playlist in Media Monkey at it writes the "É" as "Ã‰" - which I'm assuming is correct (as KennyTM pointed out). My question is, how do I get the "‰" symbol output? Do I need to select a different font and if so which one? UPDATE People seem to be missing the point. I can get the "É" written to the file using playList.WriteLine("É"); that's not the problem. The problem is that Media Monkey requires the file to be in the following format: #EXTINFUTF8:140,Yann Tiersen - Comptine D'Un Autre Ã‰tÃ©: L'AprÃ¨s Midi #EXTINF:140,Yann Tiersen - Comptine D'Un Autre Été: L'Après Midi #UTF8:04-Comptine D'Un Autre Ã‰tÃ©- L'AprÃ¨s Midi.mp3 04-Comptine D'Un Autre Été- L'Après Midi.mp3 Where all the "high-ascii" (for want of a better term) are written out as a pair of characters.

Read the article

Flex TextField won't accept "ü" and other "German" characters

- by erikcw

I'm having problems with Flex (3.5) auto converting "ü" into a "u". As soon as I paste the character in, it transforms. Is there something I need to turn on to enable these other character sets? I thought Flex supported UTF-8? Thanks!

Read the article

java: can I convert strings to byte arrays, without a BOM?

- by Cheeso

Suppose I have this code: String encoding = "UTF-16"; String text = "[Hello StackOverflow]"; byte[] message= text.getBytes(encoding); If I display the byte array in message, the result is: 0000 FE FF 00 5B 00 48 00 65 00 6C 00 6C 00 6F 00 20 ...[.H.e.l.l.o. 0010 00 53 00 74 00 61 00 63 00 6B 00 4F 00 76 00 65 .S.t.a.c.k.O.v.e 0020 00 72 00 66 00 6C 00 6F 00 77 00 5D .r.f.l.o.w.] As you can see, there's a BOM in the beginning. How can I: generate a UTF-16 byte array that lacks a BOM, from a string? convert from a byte array that contains UTF-16 chars but lacks a BOM, back to a string?

Read the article

Code to strip diacritical marks using ICU

- by Paul J. Lucas

Can somebody please provide some sample code to strip diacritical marks (i.e., replace characters having accents, umlauts, etc., with their unaccented, unumlauted, etc., character equivalents, e.g., every accented é would become a plain ASCII e) from a UnicodeString using the ICU library in C++? E.g.: UnicodeString strip_diacritics( UnicodeString const &s ) { UnicodeString result; // ... return result; } Assume that s has already been normalized. Thanks.

Read the article

python unichr problem

- by jacob

I've got some problem with unichr() on my server. Please see below: On my server (Ubuntu 9.04): >>> print unichr(255) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position 0: ordinal not in range(128) On my desktop (Ubuntu 9.10): >>> print unichr(255) ÿ I'm fairly new to python so I don't know how to solve this. Anyone care to help? Thanks.

Read the article

to escape or not to escape: well formed XHTML with diacritics

- by andresmh

Say that you have a XHTML document in English but it has accented characters (e.g. meta name="author" content="José"). Let's say you have no control over the HTTP headers. Should the characters be replaced for their corresponding named entities (e.g. á, etc)? Should the doc type and the xml:lang attribute be set to English? I know I can check the W3C recommendation but I am asking more from a practical point of view.

Read the article

How do I create JavaScript escape sequences in PHP?

- by ordinarytoucan

I'm looking for a way to create valid UTF-16 JavaScript escape sequence characters (including surrogate pairs) from within PHP. I'm using the code below to get the UTF-32 code points (from a UTF-8 encoded character). This works as JavaScript escape characters (eg. '\u00E1' for 'á') - until you get into the upper ranges where you get surrogate pairs (eg '??' comes out as '\u1D715' but should be '\uD835\uDF15')... function toOrdinal($chr) { if (ord($chr{0}) >= 0 && ord($chr{0}) <= 127) { return ord($chr{0}); } elseif (ord($chr{0}) >= 192 && ord($chr{0}) <= 223) { return (ord($chr{0}) - 192) * 64 + (ord($chr{1}) - 128); } elseif (ord($chr{0}) >= 224 && ord($chr{0}) <= 239) { return (ord($chr{0}) - 224) * 4096 + (ord($chr{1}) - 128) * 64 + (ord($chr{2}) - 128); } elseif (ord($chr{0}) >= 240 && ord($chr{0}) <= 247) { return (ord($chr{0}) - 240) * 262144 + (ord($chr{1}) - 128) * 4096 + (ord($chr{2}) - 128) * 64 + (ord($chr{3}) - 128); } elseif (ord($chr{0}) >= 248 && ord($chr{0}) <= 251) { return (ord($chr{0}) - 248) * 16777216 + (ord($chr{1}) - 128) * 262144 + (ord($chr{2}) - 128) * 4096 + (ord($chr{3}) - 128) * 64 + (ord($chr{4}) - 128); } elseif (ord($chr{0}) >= 252 && ord($chr{0}) <= 253) { return (ord($chr{0}) - 252) * 1073741824 + (ord($chr{1}) - 128) * 16777216 + (ord($chr{2}) - 128) * 262144 + (ord($chr{3}) - 128) * 4096 + (ord($chr{4}) - 128) * 64 + (ord($chr{5}) - 128); } } How do I adapt this code to give me proper UTF-16 code points? Thanks!

Read the article

Ruby character encoding problems in netbeans and command wíndow

- by salgo60

I use netbeans as development IDE and runs the application from cmd but have problems to display ISO 8859-1 characters like åäö correct in both cmd window and when I run the application from netbeans Question: What is best practice to set it up Right now I do @output.puts indent + "V" + 132.chr + "lkommen till Ruby Camping!" to get ä My environment chcp 65001 Active code page: 65001 ruby main.rb Source encoding: <Encoding:US-ASCII> Default external: #<Encoding:UTF-8> Default internal: nil Locale charmap: "CP65001" where I have in the code def self.printEncoding puts "Source encoding: #{__ENCODING__.inspect}" if defined? __ENCODING__ if defined? Environment::Encoding puts "Default external: #{Encoding.default_external.inspect}" puts "Default internal: #{Encoding.default_internal.inspect}" puts "Locale charmap: #{ Encoding.locale_charmap.inspect}" end puts "LANG environment variable: #{ENV['LANG'].inspect}" unless ENV['LANG'].nil? end ruby -v ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-mingw32]

Read the article

Why isn't wchar_t widely used in code for Linux / related platforms?

- by Ninefingers

This intrigues me, so I'm going to ask - for what reason is wchar_t not used so widely on Linux/Linux-like systems as it is on Windows? Specifically, the Windows API uses wchar_t internally whereas I believe Linux does not and this is reflected in a number of open source packages using char types. My understanding is that given a character c which requires multiple bytes to represent it, then in a char[] form c is split over several parts of char* whereas it forms a single unit in wchar_t[]. Is it not easier, then, to use wchar_t always? Have I missed a technical reason that negates this difference? Or is it just an adoption problem?

Read the article

How to generate pdf files _with_ utf-8 multibyte characters using Zend Framework

- by Sejanus

Hello, I've got a "little" problem with Zend Framework Zend_Pdf class. Multibyte characters are stripped from generated pdf files. E.g. when I write aabccdee it becomes abcd with lithuanian letters stripped. I'm not sure if it's particularly Zend_Pdf problem or php in general. Source text is encoded in utf-8, as well as the php source file which does the job. Thank you in advance for your help ;) P.S. I run Zend Framework v. 1.6 and I use FONT_TIMES_BOLD font. FONT_TIMES_ROMAN does work

Read the article

Chinese/japanese characters in a search box and form.

- by alex

Why is it that when I use Firefox to enter: ?, the GET will transform to: q=%E6%BC%A2&start=0 However, when I use IE8 and I type the same chinese character, the GET is: q=?&start=0 It turns it into a question mark.

Read the article

Convert UTF-16 to UTF-8 under Windows and Linux, in C

- by DooriBar

I was wondering if there is a recommended 'cross' Windows and Linux method for the purpose of converting strings from UTF-16LE to UTF-8? or one should use different methods for each environment? I've managed to google few references to 'iconv' , but for somreason I can't find samples of basic conversions, such as - converting a wchar_t UTF-16 to UTF-8. Anybody can recommend a method that would be 'cross', and if you know of references or a guide with samples, would very appreciate it. Thanks, Doori Bar

Read the article

Space-saving character encoding for japanese?

- by Constantin

In my opinion a common problem: character encoding in combination with a bitmap-font. Most multi-language encodings have an huge space between different character types and even a lot of unused code points there. So if I want to use them I waste a lot of memory (not only for saving multi-byte text - i mean specially for spaces in my bitmap-font) - and VRAM is mostly really valuable... So the only reasonable thing seems to be: Using an custom mapping on my texture for i.e. UTF-8 characters (so that no space is waste). BUT: This effort seems to be same with use an own proprietary character encoding (so also own order of characters in my texture). In my specially case I got texture space for 4096 different characters and need characters to display latin languages as well as japanese (its a mess with utf-8 that only support generall cjk codepages). Had somebody ever a similiar problem (I really wonder, if not)? If theres already any approach? Edit: The same Problem is described here http://www.tonypottier.info/Unicode_And_Japanese_Kanji/ but it doesnt provide an real solution how to save these bitmapfont mappings to utf-8 space efficent. So any further help is welcome!

Read the article

Ruby character encoding problems in netabenas and command wíndow

- by salgo60

I use netbeans as development IDE and runs the application from cmd but have problems to display ISO 8859-1 characters like åäö correct in both cmd window and when I run the application from netbeans Question: What is best practice to set it up Right now I do @output.puts indent + "V" + 132.chr + "lkommen till Ruby Camping!" to get ä My environment chcp 65001 Active code page: 65001 ruby main.rb Source encoding: <Encoding:US-ASCII> Default external: #<Encoding:UTF-8> Default internal: nil Locale charmap: "CP65001" where I have in the code def self.printEncoding puts "Source encoding: #{__ENCODING__.inspect}" if defined? __ENCODING__ if defined? Environment::Encoding puts "Default external: #{Encoding.default_external.inspect}" puts "Default internal: #{Encoding.default_internal.inspect}" puts "Locale charmap: #{ Encoding.locale_charmap.inspect}" end puts "LANG environment variable: #{ENV['LANG'].inspect}" unless ENV['LANG'].nil? end ruby -v ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-mingw32]

Read the article

tchar safe functions -- count parameter for UTF-8 constants

- by Dustin Getz

I'm porting a library from char to TCHAR. the count parameter of this fragment, according to MSDN, is the number of multibyte characters, not the number of bytes. so, did I get this right? _tcsncmp(access, TEXT("ftp"), 3); //or do i want _tcsnccmp? "Supported on Windows platforms only, _mbsncmp and _mbsnbcmp are multibyte versions of strncmp. _mbsncmp will compare at most count multibyte characters and _mbsnbcmp will compare at most count bytes. They both use the current multibyte code page. _tcsnccmp and _tcsncmp are the corresponding Generic functions for _mbsncmp and _mbsnbcmp, respectively. _tccmp is equivalent to _tcsnccmp."

Search Results

Search found 1649 results on 66 pages for 'unicode normalization'.

Page 26/66 | < Previous Page | 22 23 24 25 26 27 28 29 30 31 32 33 | Next Page >

- by Nicholaz

- by Jack Low

- by user295502

- by letronje

- by culebrón

- by ventolin

- by pepernik

- by Cypherjb

- by sword101

- by cpps

- by ChrisF

- by erikcw

- by Cheeso

- by Paul J. Lucas

- by jacob

- by andresmh

- by ordinarytoucan

- by salgo60

- by Ninefingers

- by Sejanus

- by alex

- by DooriBar

- by Constantin

- by salgo60

- by Dustin Getz

< Previous Page | 22 23 24 25 26 27 28 29 30 31 32 33 | Next Page >