unicode normalization - Page 10

Does normalization really hurt performance in high traffic sites?

- by Luke101

I am designing a database and I would like to normalize the database. I one query I will joining about 30-40 tables. Will this hurt the website performance if it ever becomes extremely popular? This will be the main query and it will be getting called 50% of the time. The other queries I will be joining about 2 tables. I have a choice right now to normalize or not to normalize but if the normalization becomes a problem in the future i may have to rewrite 40% of the software and it may take me a long time. Does normalization really hurt in this case? Should I denormalize now while I have the time?

Read the article

Should UTF-16 be considered harmful?

- by Artyom

I'm going to ask what is probably quite a controversial question: "Should one of the most popular encodings, UTF-16, be considered harmful?" Why do I ask this question? How many programmers are aware of the fact that UTF-16 is actually a variable length encoding? By this I mean that there are code points that, represented as surrogate pairs, take more than one element. I know; lots of applications, frameworks and APIs use UTF-16, such as Java's String, C#'s String, Win32 APIs, Qt GUI libraries, the ICU Unicode library, etc. However, with all of that, there are lots of basic bugs in the processing of characters out of BMP (characters that should be encoded using two UTF-16 elements). For example, try to edit one of these characters: 𝄞 (U+1D11E) MUSICAL SYMBOL G CLEF 𝕥 (U+1D565) MATHEMATICAL DOUBLE-STRUCK SMALL T 𝟶 (U+1D7F6) MATHEMATICAL MONOSPACE DIGIT ZERO 𠂊 (U+2008A) Han Character You may miss some, depending on what fonts you have installed. These characters are all outside of the BMP (Basic Multilingual Plane). If you cannot see these characters, you can also try looking at them in the Unicode Character reference. For example, try to create file names in Windows that include these characters; try to delete these characters with a "backspace" to see how they behave in different applications that use UTF-16. I did some tests and the results are quite bad: Opera has problem with editing them (delete required 2 presses on backspace) Notepad can't deal with them correctly (delete required 2 presses on backspace) File names editing in Window dialogs in broken (delete required 2 presses on backspace) All QT3 applications can't deal with them - show two empty squares instead of one symbol. Python encodes such characters incorrectly when used directly u'X'!=unicode('X','utf-16') on some platforms when X in character outside of BMP. Python 2.5 unicodedata fails to get properties on such characters when python compiled with UTF-16 Unicode strings. StackOverflow seems to remove these characters from the text if edited directly in as Unicode characters (these characters are shown using HTML Unicode escapes). WinForms TextBox may generate invalid string when limited with MaxLength. It seems that such bugs are extremely easy to find in many applications that use UTF-16. So... Do you think that UTF-16 should be considered harmful?

Read the article

How can I check if PHP was compiled with the UNICODE version of the Win32 API?

- by Wesley Murch

This is related to this Stack Overflow post: glob() can't find file names with multibyte characters on Windows? I'm having issues with PHP and files that have multibyte characters on Windows. Here's my test case: print_r(scandir('./uploads/')); print_r(glob('./uploads/*')); Correct Output on remote UNIX server: Array ( [0] => . [1] => .. [2] => filename-äöü.jpg [3] => filename.jpg [4] => test?test.jpg [5] => ??? ?????.jpg [6] => ?????????.jpg [7] => ???.jpg ) Array ( [0] => ./uploads/filename-äöü.jpg [1] => ./uploads/filename.jpg [2] => ./uploads/test?test.jpg [3] => ./uploads/??? ?????.jpg [4] => ./uploads/?????????.jpg [5] => ./uploads/???.jpg ) Incorrect Output locally on Windows: Array ( [0] => . [1] => .. [2] => ??? ?????.jpg [3] => ???.jpg [4] => ?????????.jpg [5] => filename-äöü.jpg [6] => filename.jpg [7] => test?test.jpg ) Array ( [0] => ./uploads/filename-äöü.jpg [1] => ./uploads/filename.jpg ) Here's a relevant excerpt from the answer I chose to accept (which actually is a quote from an article that was posted online over 2 years ago): From the comments on this article: http://www.rooftopsolutions.nl/blog/filesystem-encoding-and-php The output from your PHP installation on Windows is easy to explain : you installed the wrong version of PHP, and used a version not compiled to use the Unicode version of the Win32 API. For this reason, the filesystem calls used by PHP will use the legacy "ANSI" API and so the C/C++ libraries linked with this version of PHP will first try to convert yout UTF-8-encoded PHP string into the local "ANSI" codepage selected in the running environment (see the CHCP command before starting PHP from a command line window) Your version of Windows is MOST PROBABLY NOT responsible of this weird thing. Actually, this is YOUR version of PHP which is not compiled correctly, and that uses the legacy ANSI version of the Win32 API (for compatibility with the legacy 16-bit versions of Windows 95/98 whose filesystem support in the kernel actually had no direct support for Unicode, but used an internal conversion layer to convert Unicode to the local ANSI codepage before using the actual ANSI version of the API). Recompile PHP using the compiler option to use the UNICODE version of the Win32 API (which should be the default today, and anyway always the default for PHP installed on a server that will NEVER be Windows 95 or Windows 98...) I can't confirm whether this is my problem or not. I used phpinfo() and did not find anything interesting, but I wasn't sure what to look for. I've been using XAMPP for easy installations, so I'm really not sure exactly how it was installed. I'm using Windows 7, 64 bit - so forgive my ignorance, but I'm not even sure if "Win32" is relevant here. How can I check if my current version of PHP was compiled with the configuration mentioned above? PHP Version: 5.3.8 System: Windows NT WES-PC 6.1 build 7601 (Windows 7 Home Premium Edition Service Pack 1) i586 Build Date: Aug 23 2011 11:47:20 Compiler: MSVC9 (Visual C++ 2008) Architecture: x86 Configure Command: cscript /nologo configure.js "--enable-snapshot-build" "--disable-isapi" "--enable-debug-pack" "--disable-isapi" "--without-mssql" "--without-pdo-mssql" "--without-pi3web" "--with-pdo-oci=D:\php-sdk\oracle\instantclient10\sdk,shared" "--with-oci8=D:\php-sdk\oracle\instantclient10\sdk,shared" "--with-oci8-11g=D:\php-sdk\oracle\instantclient11\sdk,shared" "--enable-object-out-dir=../obj/" "--enable-com-dotnet" "--with-mcrypt=static" "--disable-static-analyze"

Read the article

Unicode To ASCII Conversion [closed]

- by Yuvaraj

Hi all, i creating an small application in Delphi 2009. here i got problem that when i run my application in WindowsXP its working but it is not working in Windows95. i know the problem that 95 will not support Unicode. if anyone knows the solution please tell me. and also i have one more idea that converting Unicode to ASCII. is it possible please tell how to do that. Thanks in Advance Worm Regards, Yuvaraj

Read the article

Unicode in PostgreSQL 8.4

- by user8382

I installed the "postgresql-8.4" package with default options. Everything worked fine, however I can't seem to manage to create unicode databases: -- This doesn't work createdb test1 --encoding UNICODE -- This works createdb test2 The error message "createdb: database creation failed: ERROR: new encoding (UTF8) is incompatible with the encoding of the template database (SQL_ASCII)" is a bit puzzling because (afaik) I don't use a template for creating the new db, or is it implicitely referring to the default "postgres" database for some reason ? Or maybe I'm missing a setting in a .conf file ?

Read the article

Vector normalization gives very imprecise results

- by Kipras

When I normalize vectors I receive very strange results. The lengths of the normalized vectors range from 1.0 to almost 1.5. The functions are all written by me, but I just can't find a mistake in my algorithm. When I normalize I just divide all components of the vector by the vector's length. public double length(){ return Math.sqrt(x*x + y*y); } public void normalize(){ if(length() > 0){ x /= length(); y /= length(); } } Is this supposed to happen? I mean I can see the length ranging from 0.9 to 1.1 at worst, but this is just overwhelming. Cheers

Read the article

PHP PDF library with unicode support. anybody??

- by soden

I tried dompdf. Its a lot easier than the other libraries but it doesnt have unicode support. Well it has unicode support but requires another library calld PDFLib ($1k version). so just wondering if anybodies ever stumbled upon or used any PHP pdf library which is easy to use and has unicode support. thanks,

Read the article

How do I best remove the unicode characters that XHTML regards as non-valid using php?

- by Andrew Stacey

I run a forum designed to support an international mathematics group. I've recently switched it to unicode for better support of international characters. In debugging this conversion, I've discovered that not all unicode characters are considered as valid XHTML (the relevant website appears to be http://www.w3.org/TR/unicode-xml/). One of the steps that the forum software goes through before presenting the posts to the browser is an XHTML validation/sanitisation step. It seems a reasonable idea that at that stage it should remove any unicode characters that XHTML doesn't like. So my question is: Is there a standard (or best) way of doing this in PHP? (The forum is written in PHP, by the way.) I guess that the failsafe would be a simple str_replace (if that's also the best, do I need to do anything extra to make sure it works properly with unicode?) but that would involve me having to go through the XHTML DTD (or the above-referenced W3 page) carefully to figure out what characters to list in the search part of str_replace, so if this is the best way, has someone already done that so that I can steal, err, copy, it? (Incidentally, the character that caused the problem was U+000C, the 'formfeed', which (according to the W3 page) is valid HTML but invalid XHTML!)

Read the article

Is the Unicode prefix N still needed in SQL Compact Edition?

- by Dave

At least in previous versions of SQL Server, you had to prefix Unicode string constants with an "N" to make them be treated as Unicode. Thus, select foo from bar where fizz = N'buzz' (See "Server-Side Programming with Unicode" for SQL Server 2005 "from the horse's mouth" documentation.) We have an application that is using SQL Compact Edition and I am wondering if that is still necessary. From the testing I am doing, it appears to be unneeded. That is, the following SQL statements both behave identically in SQL CE, but the second one fails in SQL Server 2005: select foo from bar where foo=N'???' select foo from bar where foo='???' (I hope I'm not swearing in some language I don't know about...) I'm wondering if that is because all strings are treated as Unicode in SQL CE, or if perhaps the default code page is now Unicode-aware. If anyone has seen any official documentation, either yea or nay, I'd appreciate it. I know I could go the safe route and just add the "N"'s, but there's a lot of code that will need changed, and if I don't need to, I don't want to! Thanks for your help!

Read the article

Why does Python sometimes upgrade a string to unicode and sometimes not?

- by samtregar

I'm confused. Consider this code working the way I expect: >>> foo = u'Émilie and Juañ are turncoats.' >>> bar = "foo is %s" % foo >>> bar u'foo is \xc3\x89milie and Jua\xc3\xb1 are turncoats.' And this code not at all working the way I expect: >>> try: ... raise Exception(foo) ... except Exception as e: ... foo2 = e ... >>> bar = "foo2 is %s" % foo2 ------------------------------------------------------------ Traceback (most recent call last): File "<ipython console>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) Can someone explain what's going on here? Why does it matter whether the unicode data is in a plain unicode string or stored in an Exception object? And why does this fix it: >>> bar = u"foo2 is %s" % foo2 >>> bar u'foo2 is \xc3\x89milie and Jua\xc3\xb1 are turncoats.' I am quite confused! Thanks for the help! UPDATE: My coding buddy Randall has added to my confusion in an attempt to help me! Send in the reinforcements to explain how this is supposed to make sense: >>> class A: ... def __str__(self): return "string" ... def __unicode__(self): return "unicode" ... >>> "%s %s" % (u'niño', A()) u'ni\xc3\xb1o unicode' >>> "%s %s" % (A(), u'niño') u'string ni\xc3\xb1o' Note that the order of the arguments here determines which method is called!

Read the article

How do I fix this unicode/cPickle error in Python?

- by alex

ids = cPickle.loads(gem.value) loads() argument 1 must be string, not unicode

Read the article

What is the best way to use Unicode in C++ on iPhone?

- by Olli Wang

Hi, I want to create my C++ libraries with Unicode support so they can be reused on other platforms. I have found the ICU (International Components for Unicode) project but I also found a discuss about Apple rejecting for using ICU (see http://tinyurl.com/y86phfb). So how do you guys use Unicode in C++ on iPhone? Thanks.

Read the article

copy text (Indian language- GUjarati) from word document to web page text area problem.

- by Avinash

Hi all, I am developing one site in Indian language (Gujarati). My problem is as below: My client wants that they able to copy Gujarati text from word document and paste into the Text area. But when i copy text from word doc and paste into text area the its get converted to the English letters. http://www.chanakyanipothi.com/gujchanakya/Gopika.ttf Above is the link of fonts which I am using. I can provide you the demo code for you to make some work on it. Is there any special thing which I am missing. Hope I am clear to you. I am running in PHP and apache. Thanks Avinash

Read the article

How to get number of bytes read from QTextStream

- by user261882

The following code I am using to find the number of read bytes from QFile. With some files it gives the correct file size, but with some files it gives me a value that is approximatively fileCSV.size()/2. I am sending two files that have same number of characters in it, but have different file sizes link text. Should i use some other objects for reading the QFile? QFile fileCSV("someFile.txt"); if ( !fileCSV.open(QIODevice::ReadOnly | QIODevice::Text)) emit errorOccurredReadingCSV(this); QTextStream textStreamCSV( &fileCSV ); // use a text stream int fileCSVSize = fileCSV.size()); qint64 reconstructedCSVFileSize = 0; while ( !textStreamCSV.atEnd() ) { QString line = textStreamCSV.readLine(); // line of text excluding '\n' if (!line.isEmpty()) { reconstructedCSVFileSize += line.size(); //this doesn't work always reconstructedCSVFileSize += 2; } else reconstructedCSVFileSize += 2; } I know that reading the size of QString is wrong, give me some other solutions if you can. Thank you.

Read the article

Jython 'unknown enocoding ms932' on japanese system

- by Houtman

i've written a program in Jython 2.5.1 which works fine on my Windows 7 machine, but on a japanese machine it throws an Exception saying "unknown encoding 'ms932'" i found that codecs.java is the only module printing the unknown encoding 'xyz' message this file loads aliases.py which does contain # cp932 codec '932' : 'cp932', 'ms932' : 'cp932', 'mskanji' : 'cp932', 'ms_kanji' : 'cp932', The file cp932.py contains import _codecs_jp, codecs But.. _codecs_jp does not exist as is also discussed in this page Does anyone have a clue where to go from here ? http://web.archiveorange.com/archive/v/8tc1Zc2rV3qiUcy9zPlA

Read the article

EF4 Code First Control Unicode and Decimal Precision, Scale with Attributes

- by Dane Morgridge

There are several attributes available when using code first with the Entity Framework 4 CTP5 Code First option. When working with strings you can use [MaxLength(length)] to control the length and [Required] will work on all properties. But there are a few things missing. By default all string will be created using unicode so you will get nvarchar instead of varchar. You can change this using the fluent API or you can create an attribute to make the change. If you have a lot of properties, the attribute will be much easier and require less code. You will need to add two classes to your project to create the attribute itself: 1: public class UnicodeAttribute : Attribute 2: { 3: bool _isUnicode; 4: 5: public UnicodeAttribute(bool isUnicode) 6: { 7: _isUnicode = isUnicode; 8: } 9: 10: public bool IsUnicode { get { return _isUnicode; } } 11: } 12: 13: public class UnicodeAttributeConvention : AttributeConfigurationConvention<PropertyInfo, StringPropertyConfiguration, UnicodeAttribute> 14: { 15: public override void Apply(PropertyInfo memberInfo, StringPropertyConfiguration configuration, UnicodeAttribute attribute) 16: { 17: configuration.IsUnicode = attribute.IsUnicode; 18: } 19: } The UnicodeAttribue class gives you a [Unicode] attribute that you can use on your properties and the UnicodeAttributeConvention will tell EF how to handle the attribute. You will need to add a line to the OnModelCreating method inside your context for EF to recognize the attribute: 1: protected override void OnModelCreating(System.Data.Entity.ModelConfiguration.ModelBuilder modelBuilder) 2: { 3: modelBuilder.Conventions.Add(new UnicodeAttributeConvention()); 4: base.OnModelCreating(modelBuilder); 5: } Once you have this done, you can use the attribute in your classes to make sure that you get database types of varchar instead of nvarchar: 1: [Unicode(false)] 2: public string Name { get; set; } Another option that is missing is the ability to set the precision and scale on a decimal. By default decimals get created as (18,0). If you need decimals to be something like (9,2) then you can once again use the fluent API or create a custom attribute. As with the unicode attribute, you will need to add two classes to your project: 1: public class DecimalPrecisionAttribute : Attribute 2: { 3: int _precision; 4: private int _scale; 5: 6: public DecimalPrecisionAttribute(int precision, int scale) 7: { 8: _precision = precision; 9: _scale = scale; 10: } 11: 12: public int Precision { get { return _precision; } } 13: public int Scale { get { return _scale; } } 14: } 15: 16: public class DecimalPrecisionAttributeConvention : AttributeConfigurationConvention<PropertyInfo, DecimalPropertyConfiguration, DecimalPrecisionAttribute> 17: { 18: public override void Apply(PropertyInfo memberInfo, DecimalPropertyConfiguration configuration, DecimalPrecisionAttribute attribute) 19: { 20: configuration.Precision = Convert.ToByte(attribute.Precision); 21: configuration.Scale = Convert.ToByte(attribute.Scale); 22: 23: } 24: } Add your line to the OnModelCreating: 1: protected override void OnModelCreating(System.Data.Entity.ModelConfiguration.ModelBuilder modelBuilder) 2: { 3: modelBuilder.Conventions.Add(new UnicodeAttributeConvention()); 4: modelBuilder.Conventions.Add(new DecimalPrecisionAttributeConvention()); 5: base.OnModelCreating(modelBuilder); 6: } Now you can use the following on your properties: 1: [DecimalPrecision(9,2)] 2: public decimal Cost { get; set; } Both these options use the same concepts so if there are other attributes that you want to use, you can create them quite simply. The key to it all is the PropertyConfiguration classes. If there is a class for the datatype, then you should be able to write an attribute to set almost everything you need. You could also create a single attribute to encapsulate all of the possible string combinations instead of having multiple attributes on each property. All in all, I am loving code first and having attributes to control database generation instead of using the fluent API is huge and saves me a great deal of time.

Read the article

Can GNU sed (for Windows) handle Unicode? If so, is it a code-page/locale issue, or a switch?

- by Peter.O

I've been using GNU SED on and off for a couple of years now. It spins me out a bit sometimes, but it does a good job... for single-byte char sets! I now and then notice references to GNU SED being Unicode-aware, but the closest I've seen of this is its "binary" mode.. and binary is not Unicode. Can GSED process a Unicode text file at CodePoint resolution, including and especially \r\n (Windows)... and if it can, does it expect UTF-8, UTF-16, or what? and how does SED detect the encoding?

Read the article

Oracle Unicode problem when using NLS_CHARACTERSET is WE8ISO8859P1 and NLS_NCHAR_CHARACTERSET is AL16UTF16, and ColdFusion as programming language

- by tsurahman

I have 2 Oracle 10g database, XE and Enterprise XE Enterprise and this are the data type I've use in the test table and then I tried to test to insert some Unicode char from http://www.sustainablegis.com/unicode/ and the results are XE Enterprise for this test, I use ColdFusion 9 developer edition <cfprocessingDirective pageencoding="utf-8"> <cfset setEncoding("form","utf-8")> <form action="" method="post"> Unicode : <br> <textarea name="txaUnicode" id="txaUnicode" cols="50" rows="10"></textarea> <br><br> Language : <br> <input type="Text" name="txtLanguage" id="txtLanguage"> <br><br> <input type="Submit"> </form> <cfset dsn = "theDSN"> <cfif StructKeyExists(FORM, "FIELDNAMES")> <cfquery name="qryInsert" datasource="#dsn#"> INSERT INTO UNICODE ( C_VARCHAR2, C_CHAR, C_CLOB, C_NVARCHAR2, LANGUAGE ) VALUES ( <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#FORM.TXAUNICODE#">, <cfqueryparam cfsqltype="CF_SQL_CHAR" value="#FORM.TXAUNICODE#">, <cfqueryparam cfsqltype="CF_SQL_LONGVARCHAR" value="#FORM.TXAUNICODE#">, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#FORM.TXAUNICODE#">, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#FORM.TXTLANGUAGE#"> ) </cfquery> </cfif> <cfquery name="qryUnicode" datasource="#dsn#"> SELECT * FROM UNICODE ORDER BY LANGUAGE </cfquery> <table border="1"> <thead> <tr> <th>LANGUAGE</th> <th>C_VARCHAR2</th> <th>C_CHAR</th> <th>C_CLOB</th> <th>C_NVARCHAR2</th> </tr> </thead> <tbody> <cfoutput query="qryUnicode"> <tr> <td>#qryUnicode.LANGUAGE#</td> <td>#qryUnicode.C_VARCHAR2#</td> <td>#qryUnicode.C_CHAR#</td> <td>#qryUnicode.C_CLOB#</td> <td>#qryUnicode.C_NVARCHAR2#</td> </tr> </cfoutput> </tbody> </table> from this guide http://www.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10749/ch6unicode.htm#i1007297 I think for my Enterprise database it should produce same thing as XE (at least for NVARCHAR2 column) since the typical solution from that guide said: Use NCHAR and NVARCHAR2 datatypes to store Unicode characters Keep WE8ISO8859P1 as the database character set Use AL16UTF16 as the national character set So, how to make it works too in my Enterprise database? Thank you :)

Read the article

doublechecking: no db-wide 'unicode switch' for sql server in the foreseeable future, i.e. like Orac

- by user72150

Hi all, I believe I know the answer to this question, but wanted to confirm: Question Does Sql server (or will it in the foreseeable future), offer a database-wide "unicode switch" which says "store all characters in unicode (UTF-16, UCS-2, etc)", i.e. like Oracle. The Context Our application has provided "CJK" (Chinese-Japanese-Korean) support for years--using Oracle as the db store. Recently folks have been asking for the same support in sql server. We store our db schema definition in xml and generate the vendor-specific definitions (oracle, sql server) using vendor-specific xsl. We can make the change easily. The problem is for upgrades. Generated scripts would need to change the column types for 100+ columns from varchar to nvarchar, varchar(max) to nvarchar(max), etc. These changes require dropping and recreating indexes and foreign keys if the any indexes/fk's exist on the column. Non-trivial. Risky. DB-wide character encodings for us would eliminate programming changes. (I.e. we would not to change the column types from varchar to nvarchar; sql server would correctly store unicode data in varchar columns). I had thought that eventually sql server would "see the light" and allow storing unicode in varchar/clob columns. Evidently not yet. Recap So just to triple check: does mssql offer a database-wide switch for character encoding? Will it in SQL2008R3? or 2010? thanks, bill

Read the article

How to convert Unicode strings (\u00e2, etc) into NSString for display?

- by karlbecker_com

I am trying to support arbitrary unicode from a variety of international users. They have already put a bunch of data into sqlite databases on their iPhones, and now I want to capture the data into a database, then send it back to their device. Right now I am using a php page that is sending data back to from an internet mysql database. The data is saved in the mysql database properly, but when it's sent back it comes out as unicode text, such as Frank\u00e2\u0080\u0099s iPad instead of just Frank's iPad where the apostrophe should really be a curly apostrophe. The answer posted to another question indicates that there is no built-in Cocoa methods to convert the "\u00e2\u0080\u0099" portion of the unicode string from the webserver to an NSString object. Is this correct? That seems really surprising (and scarily disappointing), since Cocoa definitely allows input from many different Unicode characters, and I need to support any arbitrary language that I have never heard of, and all of the possible characters. I save them to and from the local sqlite database just fine now, but once I send it to a web server, then perhaps pull down different data, I want to ensure the data pulled from the web server is correctly formatted.

Read the article

Why can't I display a unicode character in the Python Interpreter on Mac OS X Terminal.app?

- by apphacker

If I try to paste a unicode character such as the middle dot: · in my python interpreter it does nothing. I'm using Terminal.app on Mac OS X and when I'm simply in in bash I have no trouble: :~$ · But in the interpreter: :~$ python Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> ^^ I get nothing, it just ignores that I just pasted the character. If I use the escape \xNN\xNN representation of the middle dot '\xc2\xb7', and try to convert to unicode, trying to show the dot causes the interpreter to throw an error: >>> unicode('\xc2\xb7') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128) I have setup 'utf-8' as my default encoding in sitecustomize.py so: >>> sys.getdefaultencoding() 'utf-8' What gives? It's not the Terminal. It's not Python, what am I doing wrong?! This question is not related to this question, as that indivdiual is able to paste unicode into his Terminal.

Read the article

/etc/init.d Character Encoding Issue

- by Ryan Rosario

I have a script in /etc/init.d on an EC2 image that, on machine startup, pulls in source code via SVN, builds it, and then runs it using Ant. The source code is Java. Within this code is a call to the Weka library which writes a file to disk. On most Ubuntu AMIs, and my home machines' versions of Ubuntu, there is no issue. The problem is that with certain versions/AMIs of Ubuntu, Unicode characters in the file are replaced with question marks ('?'). If I run the job manually on the trouble instance, Unicode is output to file correctly, but not when run from /etc/init.d. What might be causing this problem and how can I fix it so that Unicode characters appear correctly in files written from /etc/init.d processes?

Read the article

mySQL Efficiency Issue - How to find the right balance of normalization...?

- by Foo

I'm fairly new to working with relational databases, but have read a few books and know the basics of good design. I'm facing a design decision, and I'm not sure how to continue. Here's a very over simplified version of what I'm building: People can rate photos 1-5, and I need to display the average votes on the picture while keeping track of the individual votes. For example, 12 people voted 1, 7 people voted 2, etc. etc. The normalization freak of me initially designed the table structure like this: Table pictures id* | picture | userID | Table ratings id* | pictureID | userID | rating With all the foreign key constraints and everything set as they shoudl be. Every time someone rates a picture, I just insert a new record into ratings and be done with it. To find the average rating of a picture, I'd just run something like this: SELECT AVG(rating) FROM ratings WHERE pictureID = '5' GROUP by pictureID Having it setup this way lets me run my fancy statistics to. I can easily find who rated a certain picture a 3, and what not. Now I'm thinking if there's a crapload of ratings (which is very possible in what I'm really designing), finding the average will became very expensive and painful. Using a non-normalized version would seem to be more efficient. e.g.: Table picture id | picture | userID | ratingOne | ratingTwo | ratingThree | ratingFour | ratingFive To calculate the average, I'd just have to select a single row. It seems so much more efficient, but so much more uglier. Can someone point me in the right direction of what to do? My initial research shows that I have to "find the right balance", but how do I go about finding that balance? Any articles or additional reading information would be appreciated as well. Thanks.

Read the article

What is the best way to remove accents in a python unicode string?

- by MiniQuark

I have a unicode string in python, and I would like to remove all the accents (diacritics). I found on the Web an elegant way to do this in Java: convert the unicode string to its long normalized form (with a separate character for letters and diacritics) remove all the characters whose unicode type is "diacritic". Do I need to install a library such as pyICU or is this possible with just the python standard library? And what about in python 3.0? Important note: I would like to avoid code with an explicit mapping from accented characters to their non-accented counterpart. Thanks for your help.

Read the article

C++ project type: unicode vs multi-byte; pros and cons

- by Stefan Valianu

I'm wondering what the Stack Overflow community thinks when it comes to creating a project (thinking primarily c++ here) with a unicode or a multi-byte character set. Are there pros to going Unicode straight from the start, implying all your strings will be in wide format? Are there performance issues / larger memory requirements because of a standard use of a larger character? Is there an advantage to this method? Do some processor architectures handle wide characters better? Are there any reasons to make your project Unicode if you don't plan on supporting additional languages? What reasons would one have for creating a project with a multi-byte character set? How do all of the factors above collide in a high performance environment (such as a modern video game) ?

Search Results

Search found 1649 results on 66 pages for 'unicode normalization'.

Page 10/66 | < Previous Page | 6 7 8 9 10 11 12 13 14 15 16 17 | Next Page >

- by Luke101

- by Artyom

- by Wesley Murch

- by Yuvaraj

- by user8382

- by Kipras

- by soden

- by Andrew Stacey

- by Dave

- by samtregar

- by alex

- by Olli Wang

- by Avinash

- by user261882

- by Houtman

- by Dane Morgridge

- by Peter.O

- by tsurahman

- by user72150

- by karlbecker_com

- by apphacker

- by Ryan Rosario

- by Foo

- by MiniQuark

- by Stefan Valianu

< Previous Page | 6 7 8 9 10 11 12 13 14 15 16 17 | Next Page >