Search Results

Search found 1649 results on 66 pages for 'unicode normalization'.

Page 19/66 | < Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26 | Next Page >

python raw_input odd behavior with accents containing strings

- by Ryan

I'm writing a program that asks the user for input that contains accents. The user input string is tested to see if it matches a string declared in the program. As you can see below, my code is not working: code # -*- coding: utf-8 -*- testList = ['má'] myInput = raw_input('enter something here: ') print myInput, repr(myInput) print testList[0], repr(testList[0]) print myInput in testList output in eclipse with pydev enter something here: má mv° 'm\xe2\x88\x9a\xc2\xb0' má 'm\xc3\xa1' False output in IDLE enter something here: má má u'm\xe1' má 'm\xc3\xa1' Warning (from warnings module): File "/Users/ryanculkin/Desktop/delete.py", line 8 print myInput in testList UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal False How can I get my code to print True when comparing the two strings? Additionally, I note that the result of running this code on the same input is different depending on whether I use eclipse or IDLE. Why is this? My eventual goal is to put my program on the web; is there anything that I need to be aware of, since the result seems to be so volatile?

Read the article
Is there any need for me to use wstring in the following case

- by Yan Cheng CHEOK

Currently, I am developing an app for a China customer. China customer are mostly switch to GB2312 language in their OS encoding. I need to write a text file, which will be encoded using GB2312. I use std::ofstream file I compile my application under MBCS mode, not unicode. I use the following code, to convert CString to std::string, and write it to file using ofstream std::string Utils::ToString(CString& cString) { /* Will not work correctly, if we are compiled under unicode mode. */ return (LPCTSTR)cString; } To my surprise. It just works. I thought I need to at least make use of wstring. I try to do some investigation. Here is the MBCS.txt generated. I try to print a single character named ? (its value is 0xBDC5) When I use CString to carry this character, its length is 2. When I use Utils::ToString to perform conversion to std::string, the returned string length is 2. I write to file using std::ofstream My question is : When I exam MBCS.txt using a hex editor, the value is displayed as BD (LSB) and C5 (MSB). But I am using little endian machine. Isn't hex editor should show me C5 (LSB) and BD (MSB)? I check from wikipedia. GB2312 seems doesn't specific endianness. It seems that using std::string + CString just work fine for my case. May I know in what case, the above methodology will not work? and when I should start to use wstring?

Read the article
Fastest way to copy a set (100+) of related SQLAlchemy objects and change attribute on each one

- by rebus

I am developing an app that keeps track of items going in and out of factory. For example, lets say you have 3 kinds of plastic coming in, they are mixed in various ratios and then sent out as a new product. So to keep track of this I've created following database structure: This is very simplified overview of my SQLAlchemy models: IN <- RATIO <- OUT <- REPORT ITEMS -> REPORT IN are products coming in, RATIO is various information on measurements, and OUT is a final product. REPORT is basically a header model which has a lot of REPORT ITEMS attached to it, which in turn relate it to OUT products. This would all work perfectly, but IN and RATION values can change. These changes ultimately change the OUT product which would mean the REPORT values would change. So in order to change an attribute on IN object for example I should copy that object with that attribute changed. I would think this is basically a question about database normalization, because i didn't want to duplicate all the IN, RATIO and OUT information by writing it in REPORT ITEMS table for example, but I've came across this problem (well not really a problem but rather a feature I'd like for a user to have). When the attribute on IN object is changed I want related objects (RATIO and OUT) automatically copied and related to a new IN object. So I was thinking something like: Take an existing instance of model IN that needs to change (call it old_in) Create a new one out of it with some attributes changed (call it new_in) Collect all the RATIO objects that are related to old_in Copy each RATIO and relate them to a new_in Collect all the OUT objects that are related to old RATIO Copy each OUT and relate them to a new RATIO Few questions pop to mind when i look at this problem: Should i just duplicate the data, does all this copying even make sense? If it does, should i rather do it in plain SQL? If no what would be the best approach to do it with Python and SQLAlchemy? Any general answer would suffice really, at least a pointer in right direction. I really want to free then end user for hassle of having create new ratios and out products.

Read the article
Perl LWP::UserAgent mishandling UTF-8 response

- by RedGrittyBrick

When I use LWP::UserAgent to retrieve content encoded in UTF-8 it seems LWP::UserAgent doesn't handle the encoding correctly. Here's the output after setting the Command Prompt window to Unicode by the command chcp 65001 Note that this initially gives the appearance that all is well, but I think it's just the shell reassembling bytes and decoding UTF-8, From the other output you can see that perl itself is not handling wide characters correctly. C:\perl getutf8.pl ====================================================================== HTTP/1.1 200 OK Connection: close Date: Fri, 31 Dec 2010 19:24:04 GMT Accept-Ranges: bytes Server: Apache/2.2.8 (Win32) PHP/5.2.6 Content-Length: 75 Content-Type: application/xml; charset=utf-8 Last-Modified: Fri, 31 Dec 2010 19:20:18 GMT Client-Date: Fri, 31 Dec 2010 19:24:04 GMT Client-Peer: 127.0.0.1:80 Client-Response-Num: 1 <?xml version="1.0" encoding="UTF-8"? <nameBudejovický Budvar</name ====================================================================== response content length is 33 ....v....1....v....2....v....3....v....4 <nameBudejovický Budvar</name . . . . v . . . . 1 . . . . v . . . . 2 . . . . v . . . . 3 . . . . 3c6e616d653e427564c49b6a6f7669636bc3bd204275647661723c2f6e616d653e < n a m e B u d ? ? j o v i c k ? ? B u d v a r < / n a m e Above you can see the payload length is 31 characters but Perl thinks it is 33. For confirmation, in the hex, we can see that the UTF-8 sequences c49b and c3bd are being interpreted as four separate characters and not as two Unicode characters. Here's the code #!perl use strict; use warnings; use LWP::UserAgent; my $ua = LWP::UserAgent-new(); my $response = $ua-get('http://localhost/Bud.xml'); if (! $response-is_success) { die $response-status_line; } print '='x70,"\n",$response-as_string(), '='x70,"\n"; my $r = $response-decoded_content((charset = 'UTF-8')); $/ = "\x0d\x0a"; # seems to be \x0a otherwise! chomp($r); # Remove any xml prologue $r =~ s/^<\?.*\?\x0d\x0a//; print "Response content length is ", length($r), "\n\n"; print "....v....1....v....2....v....3....v....4\n"; print $r,"\n"; print ". . . . v . . . . 1 . . . . v . . . . 2 . . . . v . . . . 3 . . . . \n"; print unpack("H*", $r), "\n"; print join(" ", split("", $r)), "\n"; Note that Bud.xml is UTF-8 encoded without a BOM. How can I persuade LWP::UserAgent to do the right thing? P.S. Ultimately I want to translate the Unicode data into an ASCII encoding, even if it means replacing each non-ASCII character with one question mark or other marker. I have accepted Ysth's "upgrade" answer - because I know it is the right thing to do when possible. However I am going to use a work-around (which may depress Tom further): $r = encode("cp437", decode("utf8", $r));

Read the article
Can URIs have non-ASCII characters?

- by Cheeso

I tried to find this in the relevant RFC, IETF RFC 3986, but couldn't figure it. Do URIs for HTTP allow Unicode, or non-ASCII of any kind? Can you please cite the section and the RFC that supports your answer. NB: For those who might think this is not programming related - it is. It's related to an ISAPI filter I'm building.

Read the article
Square Brackets in XSL-FO

- by Igman

I am attempting to create a list in XSL-FO using a square bracket. I have been able to get it working using the standard unicode bullet character (•) but I just can't seem to get it working for square brackets. I have tried using ■, but that does not seem to work. It is important that i can get the square bullets working because I am matching an existing file format.Any help in getting this working would be greatly appreciated.

Read the article
How do I obtain a code point integer from a 1 to 4 byte UTF-8 encoded sequence in Windows?

- by Patrick Niedzielski

Hello, I am Patrick Niedzielski, a programmer for the Free Software 3D adventure game Humm and Strumm. I'm working on a minimal Unicode character class in C++. I currently have an array of four bytes representing a UTF-8 sequence. On GNU/Linux, I can just convert to UTF-32 with iconv(), but on Windows, I cannot do this. Is it possible to convert the array to a single code point? Thanks, Patrick

Read the article
Removing non-breaking spaces from strings using Python

- by dontsaythekidsname

Hello: I am having some trouble with a very basic string issue in Python (that I can't figure out). Basically, I am trying to do the following: '# read file into a string myString = file.read() '# Attempt to remove non breaking spaces myString = myString.replace("\u00A0"," ") '# however, when I print my string to output to console, I get: Foo **<C2><A0>** Bar I thought that the "\u00A0" was the escape code for unicode non breaking spaces, but apparently I am not doing this properly. Any ideas on what I am doing wrong?

Read the article
llvm-clang; function/variable names containing unicdoe charactrs

- by anon

Hi! I'm interested in using unicode characters (like \apha) in function/varaible names in my c++ program which I will compile with clang++ on linux. Does anyone know of a good guide / list of rules to go by for making sure that everything ends up compiling fine / aoiding linking errors / ... Thanks!

Read the article
The confusion on python encoding

- by zhangzhong

I retrieved the data encoded in big5 from database,and I want to send the data as email of html content, the code is like this: html += """<tr><td>""" html += """unicode(rs[0], 'big5')""" # rs[0] is data encoded in big5 I run the script, but the error raised: UnicodeDecodeError: 'ascii' codec can't decode byte...... However, I tried the code in interactive python command line, there are no errors raised, could you give me the clue?

Read the article
BindingSource.Filter Problem [C# 2.0 on vs2008]

- by mr.gobbolino

BindingSource.Filter property is doesn't work while i filter with the unicode characters. any soluctoion for that? or how to implement custom BindingSource to use correct filter property.

Read the article
UTF-8 GET using Indy 10.5.8.0 and Delphi XE2

- by Bogdan Botezatu

I'm writing my first Unicode application with Delphi XE2 and I've stumbled upon an issue with GET requests to an Unicode URL. Shortly put, it's a routine in a MP3 tagging application that takes a track title and an artist and queries Last.FM for the corresponding album, track no and genre. I have the following code: function GetMP3Info(artist, track: string) : TMP3Data //<---(This is a record) var TrackTitle, ArtistTitle : WideString; webquery : WideString; [....] WebQuery := UTF8Encode('http://ws.audioscrobbler.com/2.0/?method=track.getcorrection&api_key=' + apikey + '&artist=' + artist + '&track=' + track); //[processing the result in the web query, getting the correction for the artist and title] // eg: for artist := Bucovina and track := Mestecanis, the corrected values are //ArtistTitle := Bucovina; // TrackTitle := Mestecani?; //Now here is the tricky part: webquery := UTF8Encode('http://ws.audioscrobbler.com/2.0/?method=track.getInfo&api_key=' + apikey + '&artist=' + unescape(ArtistTitle) + '&track=' + unescape(TrackTitle)); //the unescape function replaces spaces (' ') with '+' to comply with the last.fm requests [some more processing] end; The webquery looks in a TMemo just right (http://ws.audioscrobbler.com/2.0/?method=track.getInfo&api_key=e5565002840xxxxxxxxxxxxxx23b98ad&artist=Bucovina&track=Mestecani?) Yet, when I try to send a GET() to the webquery using IdHTTP (with the ContentEncoding property set to 'UTF-8'), I see in Wireshark that the component is GET-ing the data to the ANSI value '/2.0/?method=track.getInfo&api_key=e5565002840xxxxxxxxxxxxxx23b98ad&artist=Bucovina&track=Mestec?ni?' Here is the full headers for the GET requests and responses: GET /2.0/?method=track.getInfo&api_key=e5565002840xxxxxxxxxxxxxx23b98ad&artist=Bucovina&track=Mestec?ni? HTTP/1.1 Content-Encoding: UTF-8 Host: ws.audioscrobbler.com Accept: text/html, */* Accept-Encoding: identity User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.23) Gecko/20110920 Firefox/3.6.23 SearchToolbar/1.22011-10-16 20:20:07 HTTP/1.0 400 Bad Request Date: Tue, 09 Oct 2012 20:46:31 GMT Server: Apache/2.2.22 (Unix) X-Web-Node: www204 Access-Control-Allow-Origin: * Access-Control-Allow-Methods: POST, GET, OPTIONS Access-Control-Max-Age: 86400 Cache-Control: max-age=10 Expires: Tue, 09 Oct 2012 20:46:42 GMT Content-Length: 114 Connection: close Content-Type: text/xml; charset=utf-8; <?xml version="1.0" encoding="utf-8"?> <lfm status="failed"> <error code="6"> Track not found </error> </lfm> The question that puzzles me is am I overseeing anything related to setting the property of the tidhttp control? How can I stop the well-formated URL i'm composing in the application from getting wrongfully sent to the server? Thanks.

Read the article
Difference between Big Endian and little Endian Byte order

- by BALAMURUGAN

what is the difference between Big Endian byte order and little Endian Byte order. These both are related to Unicode and UTF16 where we use this?

Read the article
What is winansi?

- by stevebot

I can't find a wikipage or anthing :(. It's an encoding like unicode right? So it has it's own mapping of code points to characters?

Read the article
Accented characters in matplotlib

- by OldJim

Does anyone know a way to get matplotlib to render accented chars (é,ã,â,etc)? For instance i'm trying to use accented chars on set_yticklabels() and matplot renders squares instead, and when i use unicode() it renders the wrong chars. Is there a way to make this work? Thanks in advance, Jim.

Read the article
UCA + Natural Sorting

- by Alix Axel

I recently learnt that PHP already supports the Unicode Collation Algorithm via the intl extension: $array = array ( 'al', 'be', 'Alpha', 'Beta', 'Álpha', 'Àlpha', 'Älpha', '????', 'img10.png', 'img12.png', 'img1.png', 'img2.png', ); if (extension_loaded('intl') === true) { collator_asort(collator_create('root'), $array); } Array ( [0] => al [2] => Alpha [4] => Álpha [5] => Àlpha [6] => Älpha [1] => be [3] => Beta [11] => img1.png [9] => img10.png [8] => img12.png [10] => img2.png [7] => ???? ) As you can see this seems to work perfectly, even with mixed case strings! The only drawback I've encountered so far is that there is no support for natural sorting and I'm wondering what would be the best way to work around that, so that I can merge the best of the two worlds. I've tried to specify the Collator::SORT_NUMERIC sort flag but the result is way messier: collator_asort(collator_create('root'), $array, Collator::SORT_NUMERIC); Array ( [8] => img12.png [7] => ???? [9] => img10.png [10] => img2.png [11] => img1.png [6] => Älpha [5] => Àlpha [1] => be [2] => Alpha [3] => Beta [4] => Álpha [0] => al ) However, if I run the same test with only the img*.png values I get the ideal output: Array ( [3] => img1.png [2] => img2.png [1] => img10.png [0] => img12.png ) Can anyone think of a way to preserve the Unicode sorting while adding natural sorting capabilities?

Read the article
Reading Email using Pop3 in C#

- by Eldila

I am looking for a method of reading emails using Pop3 in C# 2.0. Currently, I am using code found in CodeProject. However, this solution is less than ideal. The biggest problem is that it doesn't support emails written in unicode.

Read the article
Send parameters to Web Service Persian ?

- by user362813

Display information in Farsi, but I have a problem when my site for web services can be sent a character "?" are displayed. pages are saved with Unicode(utf-8 with signature)codepage 65001 and the following tags in my master page : <'html xmlns="http://www.w3.org/1999/xhtml" lang="fa" xml:lang="fa" <'meta http-equiv="Content-Type" content="text/xml; charset=utf-8" / <'meta http-equiv="Content-Language" content="fa" / <'body lang="fa"-- and in web.confing : <'globalization fileEncoding="utf-8" requestEncoding="utf-8" responseEncoding="utf-8" /

Read the article
SQLite/iPhone read copyright symbol

- by Marco A

Hi All, I am having problems reading the copyright symbol from a sqlite db that I have for my App that I am developing. I import the information manually, ie, from an excel sheet. I have tried two ways of doing it and failed with both: 1) Tried replacing the copyright symbol with "\u00ae" (unicode combination) within excel and then importing the modified file. - Result: I get the combination of \u00ae as a part of the string, it doesnt detect the unicode combination. 2) Tried leaving as it is. Importing the excel with the copyright symbol. - Result: I get a symbol that is different from the copyright, its something like an AE put together.looks like this: Æ Heres my code how I read from DB: -(void) readCategoriesFromDatabase:(NSString *) rest_input { // Init the products Array categories = [[NSMutableArray alloc] init]; // Open the database from the users filessytem rest_input = [rest_input stringByAppendingString:@"'"]; NSString *newString; newString = [@"select distinct category from food where restaurant='" stringByAppendingString:rest_input]; const char *cat_sqlStatement = [newString UTF8String]; sqlite3_stmt *cat_compiledStatement; if(sqlite3_prepare_v2(database, cat_sqlStatement, -1, &cat_compiledStatement, NULL) == SQLITE_OK) { // Loop through the results and add them to the feeds array while(sqlite3_step(cat_compiledStatement) == SQLITE_ROW) { NSString *catName = [NSString stringWithUTF8String:(char *)sqlite3_column_text(cat_compiledStatement,0)]; // Create a new product object with the data from the database Product *category = [[Product alloc] initWithName:catName]; // Add the product object to the respective Array [categories addObject:category]; [category release]; } sqlite3_finalize(cat_compiledStatement); } NSLog(@"Finished Accessing Database to gather Categories...."); } I open the DB with this function: -(void) checkAndCreateDatabase{ NSLog(@"Checking/Creating Database...."); NSFileManager *fileManager = [NSFileManager defaultManager]; success = [fileManager fileExistsAtPath:databasePath]; [fileManager removeFileAtPath:databasePath handler:nil]; NSString *databasePathFromApp = [[[NSBundle mainBundle] resourcePath] stringByAppendingPathComponent:databaseName]; [fileManager copyItemAtPath:databasePathFromApp toPath:databasePath error:nil]; [fileManager release]; if (sqlite3_open([databasePath UTF8String], &database) != SQLITE_OK) { sqlite3_close(database); database = nil; } NSLog(@"Finished Checking/Creating Database...."); } Thanks to anything that can help me out.

Read the article
Filtering illegal XML characters in Java

- by GrzegorzOledzki

XML spec defines a subset of Unicode characters which are allowed in XML documents: http://www.w3.org/TR/REC-xml/#charsets. How do I filter out these characters from a String in Java? simple test case: Assert.equals("", filterIllegalXML(""+Character.valueOf((char) 2)))

Read the article
Really fast C++ html parser

- by Alessandro

Hello to all, I'm doing a html text feature extractor in C++; the program need to be REALLY fast: i need to extract a this features in ms per html page and the memory usage needs to be good and finally unicode encoding well be nice. I know how difficult is to have all of this things, but i want a parser close to these things at least. Somebody have a suggestion?

Read the article
Convertion strings like \\uXXXX in python

- by Gregory Lo

I have a strings like \uXXXX (representation) and I need to convert it into unicode. I recieve it from 3rd party service so python interpreter don't convert it and I need convertion in my code. How do I do it in Python? >>> s u'\\u0e4f\\u032f\\u0361\\u0e4f'

Read the article
How do you print raw UTF-8 characters from their numbers? [PHP]

- by Xeoncross

Say I wanted to print a ÿ (latin small y with diaeresis) from it's Unicode/UTF-8 number of "U+00FF" or hex of "c3 bf". How can I do that in PHP? Note: In order for it show correctly in a browser I know that the first step is header('Content-Type: text/html; charset=utf-8');

Read the article
Changing text appearence in vim

- by anon

Suppose I have a file, whose entire contents is: \u1234 and suppose 1234 is the code for \alpha is there a way to, in vim, have the "\1234" show up as a single \alpha symbol (and be treated as an \alpha symbol) ? Thanks! [This problem arises since I want to to use unicode names in g++]

Read the article
C++: Chr() and unichr() equivalent?

- by alex

I could have sworn I used a chr() function 40 minutes ago but can't find the file. I know it can go up to 256 so I use this: std::string chars = ""; chars += (char) 42; //etc So that's alright, but I really want to access unicode characters. Can I do (w_char) 512? Or maybe something just like the unichr() function in python, I just can't find a way to access any of those characters.

Read the article

< Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26 | Next Page >