unicode - Page 10 - Developer IT

Can GNU sed (for Windows) handle Unicode? If so, is it a code-page/locale issue, or a switch?

- by Peter.O

I've been using GNU SED on and off for a couple of years now. It spins me out a bit sometimes, but it does a good job... for single-byte char sets! I now and then notice references to GNU SED being Unicode-aware, but the closest I've seen of this is its "binary" mode.. and binary is not Unicode. Can GSED process a Unicode text file at CodePoint resolution, including and especially \r\n (Windows)... and if it can, does it expect UTF-8, UTF-16, or what? and how does SED detect the encoding?

Read the article

Oracle Unicode problem when using NLS_CHARACTERSET is WE8ISO8859P1 and NLS_NCHAR_CHARACTERSET is AL16UTF16, and ColdFusion as programming language

- by tsurahman

I have 2 Oracle 10g database, XE and Enterprise XE Enterprise and this are the data type I've use in the test table and then I tried to test to insert some Unicode char from http://www.sustainablegis.com/unicode/ and the results are XE Enterprise for this test, I use ColdFusion 9 developer edition <cfprocessingDirective pageencoding="utf-8"> <cfset setEncoding("form","utf-8")> <form action="" method="post"> Unicode : <br> <textarea name="txaUnicode" id="txaUnicode" cols="50" rows="10"></textarea> <br><br> Language : <br> <input type="Text" name="txtLanguage" id="txtLanguage"> <br><br> <input type="Submit"> </form> <cfset dsn = "theDSN"> <cfif StructKeyExists(FORM, "FIELDNAMES")> <cfquery name="qryInsert" datasource="#dsn#"> INSERT INTO UNICODE ( C_VARCHAR2, C_CHAR, C_CLOB, C_NVARCHAR2, LANGUAGE ) VALUES ( <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#FORM.TXAUNICODE#">, <cfqueryparam cfsqltype="CF_SQL_CHAR" value="#FORM.TXAUNICODE#">, <cfqueryparam cfsqltype="CF_SQL_LONGVARCHAR" value="#FORM.TXAUNICODE#">, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#FORM.TXAUNICODE#">, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#FORM.TXTLANGUAGE#"> ) </cfquery> </cfif> <cfquery name="qryUnicode" datasource="#dsn#"> SELECT * FROM UNICODE ORDER BY LANGUAGE </cfquery> <table border="1"> <thead> <tr> <th>LANGUAGE</th> <th>C_VARCHAR2</th> <th>C_CHAR</th> <th>C_CLOB</th> <th>C_NVARCHAR2</th> </tr> </thead> <tbody> <cfoutput query="qryUnicode"> <tr> <td>#qryUnicode.LANGUAGE#</td> <td>#qryUnicode.C_VARCHAR2#</td> <td>#qryUnicode.C_CHAR#</td> <td>#qryUnicode.C_CLOB#</td> <td>#qryUnicode.C_NVARCHAR2#</td> </tr> </cfoutput> </tbody> </table> from this guide http://www.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10749/ch6unicode.htm#i1007297 I think for my Enterprise database it should produce same thing as XE (at least for NVARCHAR2 column) since the typical solution from that guide said: Use NCHAR and NVARCHAR2 datatypes to store Unicode characters Keep WE8ISO8859P1 as the database character set Use AL16UTF16 as the national character set So, how to make it works too in my Enterprise database? Thank you :)

Read the article

doublechecking: no db-wide 'unicode switch' for sql server in the foreseeable future, i.e. like Orac

- by user72150

Hi all, I believe I know the answer to this question, but wanted to confirm: Question Does Sql server (or will it in the foreseeable future), offer a database-wide "unicode switch" which says "store all characters in unicode (UTF-16, UCS-2, etc)", i.e. like Oracle. The Context Our application has provided "CJK" (Chinese-Japanese-Korean) support for years--using Oracle as the db store. Recently folks have been asking for the same support in sql server. We store our db schema definition in xml and generate the vendor-specific definitions (oracle, sql server) using vendor-specific xsl. We can make the change easily. The problem is for upgrades. Generated scripts would need to change the column types for 100+ columns from varchar to nvarchar, varchar(max) to nvarchar(max), etc. These changes require dropping and recreating indexes and foreign keys if the any indexes/fk's exist on the column. Non-trivial. Risky. DB-wide character encodings for us would eliminate programming changes. (I.e. we would not to change the column types from varchar to nvarchar; sql server would correctly store unicode data in varchar columns). I had thought that eventually sql server would "see the light" and allow storing unicode in varchar/clob columns. Evidently not yet. Recap So just to triple check: does mssql offer a database-wide switch for character encoding? Will it in SQL2008R3? or 2010? thanks, bill

Read the article

How to convert Unicode strings (\u00e2, etc) into NSString for display?

- by karlbecker_com

I am trying to support arbitrary unicode from a variety of international users. They have already put a bunch of data into sqlite databases on their iPhones, and now I want to capture the data into a database, then send it back to their device. Right now I am using a php page that is sending data back to from an internet mysql database. The data is saved in the mysql database properly, but when it's sent back it comes out as unicode text, such as Frank\u00e2\u0080\u0099s iPad instead of just Frank's iPad where the apostrophe should really be a curly apostrophe. The answer posted to another question indicates that there is no built-in Cocoa methods to convert the "\u00e2\u0080\u0099" portion of the unicode string from the webserver to an NSString object. Is this correct? That seems really surprising (and scarily disappointing), since Cocoa definitely allows input from many different Unicode characters, and I need to support any arbitrary language that I have never heard of, and all of the possible characters. I save them to and from the local sqlite database just fine now, but once I send it to a web server, then perhaps pull down different data, I want to ensure the data pulled from the web server is correctly formatted.

Read the article

Why can't I display a unicode character in the Python Interpreter on Mac OS X Terminal.app?

- by apphacker

If I try to paste a unicode character such as the middle dot: · in my python interpreter it does nothing. I'm using Terminal.app on Mac OS X and when I'm simply in in bash I have no trouble: :~$ · But in the interpreter: :~$ python Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> ^^ I get nothing, it just ignores that I just pasted the character. If I use the escape \xNN\xNN representation of the middle dot '\xc2\xb7', and try to convert to unicode, trying to show the dot causes the interpreter to throw an error: >>> unicode('\xc2\xb7') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128) I have setup 'utf-8' as my default encoding in sitecustomize.py so: >>> sys.getdefaultencoding() 'utf-8' What gives? It's not the Terminal. It's not Python, what am I doing wrong?! This question is not related to this question, as that indivdiual is able to paste unicode into his Terminal.

Read the article

/etc/init.d Character Encoding Issue

- by Ryan Rosario

I have a script in /etc/init.d on an EC2 image that, on machine startup, pulls in source code via SVN, builds it, and then runs it using Ant. The source code is Java. Within this code is a call to the Weka library which writes a file to disk. On most Ubuntu AMIs, and my home machines' versions of Ubuntu, there is no issue. The problem is that with certain versions/AMIs of Ubuntu, Unicode characters in the file are replaced with question marks ('?'). If I run the job manually on the trouble instance, Unicode is output to file correctly, but not when run from /etc/init.d. What might be causing this problem and how can I fix it so that Unicode characters appear correctly in files written from /etc/init.d processes?

Read the article

What is the best way to remove accents in a python unicode string?

- by MiniQuark

I have a unicode string in python, and I would like to remove all the accents (diacritics). I found on the Web an elegant way to do this in Java: convert the unicode string to its long normalized form (with a separate character for letters and diacritics) remove all the characters whose unicode type is "diacritic". Do I need to install a library such as pyICU or is this possible with just the python standard library? And what about in python 3.0? Important note: I would like to avoid code with an explicit mapping from accented characters to their non-accented counterpart. Thanks for your help.

Read the article

C++ project type: unicode vs multi-byte; pros and cons

- by Stefan Valianu

I'm wondering what the Stack Overflow community thinks when it comes to creating a project (thinking primarily c++ here) with a unicode or a multi-byte character set. Are there pros to going Unicode straight from the start, implying all your strings will be in wide format? Are there performance issues / larger memory requirements because of a standard use of a larger character? Is there an advantage to this method? Do some processor architectures handle wide characters better? Are there any reasons to make your project Unicode if you don't plan on supporting additional languages? What reasons would one have for creating a project with a multi-byte character set? How do all of the factors above collide in a high performance environment (such as a modern video game) ?

Read the article

Which database and language is better at handling Unicode?

- by user187809

which database should I use, if my application is going to be in multiple languages (including Chinese, Japanese etc)? In other words, is MySQL better or worse than Postgres to handle unicode etc? (these are the only two databases my hosting company has) Also, which language is better for handling unicode? PHP or Ruby/Rails?

Read the article

MFC : How to check if a character input is unicode?

- by Owen

Hi All, I have an input text field which accepts a certain maximum number of characters. This number of characters should change though if the character inputs are unicode. Question: Is there a way I could check if the character input is unicode or not?

Read the article

Are there standard ways and guidelines on extending Unicode with custom user-defined character sets?

- by Ivan

In a linguistic project of mine I need to use symbols not contained in Unicode. I can draw the font, but would like to avoid overwriting characters defined by the standard. Are there any standard solutions for such cases? As reserved ranges etc. I'd need common Unicode-aware software to handle it seamlessly.

Read the article

How to compute a unicode string which bidirectional representation is specified?

- by valdo

Hello, fellows. I have a rather pervert question. Please forgive me :) There's an official algorithm that describes how bidirectional unicode text should be presented. http://www.unicode.org/reports/tr9/tr9-15.html I receive a string (from some 3rd-party source), which contains latin/hebrew characters, as well as digits, white-spaces, punctuation symbols and etc. The problem is that the string that I receive is already in the representation form. I.e. - the sequence of characters that I receive should just be presented from left to right. Now, my goal is to find the unicode string which representation is exactly the same. Means - I need to pass that string to another entity; it would then render this string according to the official algorithm, and the result should be the same. Assuming the following: The default text direction (of the rendering entity) is RTL. I don't want to inject "special unicode characters" that explicitly override the text direction (such as RLO, RLE, etc.) I suspect there may exist several solutions. If so - I'd like to preserve the RTL-looking of the string as much as possible. The string usually consists of hebrew words mostly. I'd like to preserve the correct order of those words, and characters inside those words. Whereas other character sequences may (and should) be transposed. One naive way to solve this is just to swap the whole string (this takes care of the hebrew words), and then swap inside it sequences of non-hebrew characters. This however doesn't always produce correct results, because actual rules of representation are rather complex. The only comprehensive algorithm that I see so far is brute-force check. The string can be divided into sequences of same-class characters. Those sequences may be joined in random order, plus any of them may be reversed. I can check all those combinations to obtain the correct result. Plus this technique may be optimized. For instance the order of hebrew words is known, so we only have to check different combinations of their "joining" sequences. Any better ideas? If you have an idea, not necessarily the whole solution - it's ok. I'll appreciate any idea. Thanks in advance.

Read the article

How does it matter if a character is 8 bit or 16 bit or 32 bit

- by vin

Well, I am reading Programing Windows with MFC, and I came across Unicode and ASCII code characters. I understood the point of using Unicode over ASCII, but what I do not get is how and why is it important to use 8bit/16bit/32bit character? What good does it do to the system? How does the processing of the operating system differ for different bits of character. My question here is, what does it mean to a character when it is a x-bit character?

Read the article

How to deal with Unicode strings in C/C++ in a cross-platform friendly way?

- by Sorin Sbarnea

On platforms different than Windows you could easily use char * strings and treat them as UTF-8. The problem is that on Windows you are required to accept and send messages using wchar* strings (W). If you'll use the ANSI functions (A) you will not support Unicode. So if you want to write truly portable application you need to compile it as Unicode on Windows. Now, In order to keep the code clean I would like to see what is the recommended way of dealing with strings, a way that minimize ugliness in the code. Type of strings you may need: std::string, std::wstring, std::tstring,char *,wchat_t *, TCHAR*, CString (ATL one). Issues you may encounter: cout/cerr/cin and their Unicode variants wcout,wcerr,wcin all renamed wide string functions and their TCHAR macros - like strcmp, wcscmp and _tcscmp. constant strings inside code, with TCHAR you will have to fill your code with _T() macros. What approach do you see as being best? (examples are welcome) Personally I would go for a std::tstring approach but I would like to see how would do to the conversions where they are necessary.

Read the article

Create Multilingual Web Sites with Windows Unicode Fonts

To input text in languages other than your keyboard default on Windows platforms, you'll need to do some tweaking. Learn how to enable various International Unicode Keyboards in Windows XP.

Read the article

Multiple vulnerabilities in International Components for Unicode (ICU)

- by chandan

CVE DescriptionCVSSv2 Base ScoreComponentProduct and Resolution CVE-2011-2791 Improper Restriction of Operations within the Bounds of a Memory Buffer vulnerability 7.5 International Components for Unicode (ICU) Solaris 10 SPARC: 119810-07 X86: 119811-07 Solaris 11 11/11 SRU 11.4 CVE-2011-4599 Improper Restriction of Operations within the Bounds of a Memory Buffer vulnerability 7.5 This notification describes vulnerabilities fixed in third-party components that are included in Oracle's product distributions.Information about vulnerabilities affecting Oracle products can be found on Oracle Critical Patch Updates and Security Alerts page.

Read the article

[C#] How to convert string encoded in windows-1250 to unicode ?

- by Deveti Putnik

Hi! I am receiving from some dll (which is wrapper for some external data source) strings in Windows-1250 codepage and I would like to insert them correctly (as unicode) to table in SQL Server Database. Since particular row in database which should hold that data is of NVarchar type, I only needed to convert it in my C# code to unicode and pass it as parameter. Everything is well and nice, but I stumbled on conversion step. I tried the following but that doesn't work: private static String getUnicodeValue(string string2Encode) // { Encoding srcEncoding = Encoding.GetEncoding("Windows-1250"); UnicodeEncoding dstEncoding = new UnicodeEncoding(); byte[] srcBytes = srcEncoding.GetBytes(string2Encode); byte[] dstBytes = dstEncoding.GetBytes(string2Encode); return dstEncoding.GetString(dstBytes); } When I insert this returned string to table, I don't get correct letters like Ð, d, C, c, C or c. Please, help! :)

Read the article

C#: How to print a unicode string to console?

- by Lopper

How do I print out the value of a unicode String in C# to the console? byte[] unicodeBytes = new byte[] {0x61, 0x70, 0x70, 0x6C, 0x69, 0x63, 0x61, 0x74, 0x69, 0x6F, 0x6E, 0x2F, 0x70, 0x63, 0x61, 0x70}; string unicodeString = Encoding.Unicode.GetString(unicodeBytes); Console.WriteLine(unicodeString); What I get for the above is "?????????" However, I see the following in the autos and local window when in debug mode for the value of unicodeString which is what I wanted to display. "??????????" How do I print out the correct result to the console as what the autos and local window for debugging demonstrated?

Read the article

How to use ORDER BY, LOWER .. in SQL SERVER 2008 with non-unicode data

- by hgulyan

Hi, The question is about Armenian. I'm using sql server 2005, collation SQL_Latin1_General_CP1_CI_AS, data mostly is in Armenian and we can't use unicode. I tested on ms sql 2008 with a windows collation for armenian language ( Cyrillic_General_100_ ), I have found here, ( http://msdn.microsoft.com/en-us/library/ms188046.aspx ) but it didn't help. I have a function, that orders hex values and a lower function, which takes each char in each string and converts it to it's lower form, but it's not acceptable solution, it works really slow, calling that functions on every column of a huge table. Is there any solution for this issue not using unicode and not working with hex values manually?

Read the article

How do I read Unicode characters from an MS Access 2007 database through Java?

- by Peter

In Java, I have written a program that reads a UTF8 text file. The text file contains a SQL query of the SELECT kind. The program then executes the query on the Microsoft Access 2007 database and writes all fields of the first row to a UTF8 text file. The problem I have is when a row is returned that contains unicode characters, such as "?". These characters show up as "?" in the text file. I know that the text files are read and written correctly, because a dummy UTF8 character ("?") is read from the text file containing the SQL query and written to the text file containing the resulting row. The UTF8 character looks correct when the written text file is opened in Notepad, so the reading and writing of the text files are not part of the problem. This is how I connect to the database and how I execute the SQL query: ---- START CODE Connection c = DriverManager.getConnection("jdbc:odbc:Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:/database.accdb;Pwd=temp"); ResultSet r = c.createStatement().executeQuery(sql); ---- END CODE I have tried making a charSet property to the Connection but it makes no difference: ---- START CODE Properties p = new Properties(); p.put("charSet", "utf-8"); p.put("lc_ctype", "utf-8"); p.put("encoding", "utf-8"); Connection c = DriverManager.getConnection("...", p); ---- END CODE Tried with "utf8"/"UTF8"/"UTF-8", no difference. If I enter "UTF-16" I get the following exception: "java.lang.IllegalArgumentException: Illegal replacement". Been searching around for hours with no results and now turn my hope to you. Please help! I also accept workaround suggestions. =) What I want to be able to do is to make a Unicode query (for example one that searches for posts that contain the "?" character) and to have results with Unicode characters receieved and saved correctly. Thank you!

Read the article

How to use ORDER BY, LOWER .. in SQL SERVER 2008 with non-unicode languages

- by hgulyan

Hi, The question is about Armenian. I'm using sql server 2005, collation SQL_Latin1_General_CP1_CI_AS, data mostly is in Armenian and we can't use unicode. I tested on ms sql 2008 with a windows collation for armenian language ( Cyrillic_General_100_ ), I have found here, ( http://msdn.microsoft.com/en-us/library/ms188046.aspx ) but it didn't help. I have a function, that orders hex values and lower function, which takes each char in string and covnerts it to it's lower form, but it's not acceptable solution, it works really slow, calling that functions on every column of a huge table. Is there any solution for this issue not using unicode and working with hex values manually?

Read the article

Python file input string: how to handle escaped unicode characters?

- by Michi

In a text file (test.txt), my string looks like this: Gro\u00DFbritannien Reading it, python escapes the backslash: >>> file = open('test.txt', 'r') >>> input = file.readline() >>> input 'Gro\\u00DFbritannien' How can I have this interpreted as unicode? decode() and unicode() won't do the job. The following code writes Gro\u00DFbritannien back to the file, but I want it to be Großbritannien >>> input.decode('latin-1') u'Gro\\u00DFbritannien' >>> out = codecs.open('out.txt', 'w', 'utf-8') >>> out.write(input)

Read the article

Is there a way to enable Unicode characters in all browsers on Windows XP?

- by Daniel Pietzsch

I'd like to use unicode symbols within my website (especially Dingbats). Is there any way to enable this inside all (or at least some) browsers in Windows XP, without having the user to adjust any of his settings? I use the HTML5 doctype with the charset configured to UTF-8: <!DOCTYPE html> <html> <head> <meta charset="utf-8" /> </head> <body></body> </html> The browsers recognize the charset correctly (even IE7). But no special characters are displayed. I only see an empty square box. This is the case for all of the following browsers: IE7, Safari 4, Firefox 3.5, Chrome 4.1, Opera 10.51. So, is there any way to configure to enable all (or most) unicode characters for browsers running on Windows XP?

Read the article

How could I catch an "Unicode non-character"-warning?

- by sid_com

How could I catch the "Unicode non-character 0xffff is illegal for interchange"-warning? #!/usr/bin/env perl use warnings; use 5.012; use Try::Tiny; use warnings FATAL => qw(all); my $character; try { $character = "\x{ffff}"; } catch { die "---------- caught error ----------\n"; }; say "something"; Output: # Unicode non-character 0xffff is illegal for interchange at ./perl1.pl line 11.

Read the article

Why don't scripting languages output Unicode to the Windows console?

- by hippietrail

The Windows console has been Unicode aware for at least a decade and perhaps as far back as Windows NT. However for some reason the major cross-platform scripting languages including Perl and Python only ever output various 8-bit encodings, requiring much trouble to work around. Perl gives a "wide character in print" warning, Pythong gives a charmap error and quits. Why on earth after all these years do they not just simply call the Win32 -W APIs that output UTF-16 Unicode instead of forcing everything through the ANSI/codepage bottleneck? Is it just that cross-platform performance is low priority? Is it that the languages use UTF-8 internally and find it too much bother to output UTF-16? Or are the -W APIs inherently broken to such a degree that they can't be used as-is?

Search Results

Search found 1474 results on 59 pages for 'unicode'.

Page 10/59 | < Previous Page | 6 7 8 9 10 11 12 13 14 15 16 17 | Next Page >

- by Peter.O

- by tsurahman

- by user72150

- by karlbecker_com

- by apphacker

- by Ryan Rosario

- by MiniQuark

- by Stefan Valianu

- by user187809

- by Owen

- by Ivan

- by valdo

- by vin

- by Sorin Sbarnea

- by chandan

- by Deveti Putnik

- by Lopper

- by hgulyan

- by Peter

- by hgulyan

- by Michi

- by Daniel Pietzsch

- by sid_com

- by hippietrail

< Previous Page | 6 7 8 9 10 11 12 13 14 15 16 17 | Next Page >