unicode normalization - Page 8

Convert char array to UNICODE in MFC C++

- by chathuradd

I'm using the folowing code to read files from a folder in windows. However since this a MFC application I have to convert the char array to UNICODE. For example if I hard code the path as "C:\images3\test\" as shown below the code works. WIN32_FIND_DATA FindFileData; HANDLE hFind = INVALID_HANDLE_VALUE; hFind = FindFirstFile(_T("C:\images3\test\"), &FindFileData); What I want is to get this working as follows: char* pathOfFileType; hFind = FindFirstFile(_T(pathOfFileType), &FindFileData); Can anyone tell me how to fix this problem ? Thanks

Read the article

C++ unicode UTF-16 encoding

- by Dan

Hi all, I have a wide char string is L"hao123--??????", and it must be encoded to "hao123--\u6211\u7684\u4E0A\u7F51\u4E3B\u9875". I was told that the encoded string is a special “%uNNNN” format for encoding Unicode UTF-16 code points. In this website(http://rishida.net/tools/conversion/), it tell me it's JavaScript escapes. But I don't know how to encode it with C++. It that any library to do this work? or give me some tips. Thanks my friends!

Read the article

Regex and unicode

- by dbr

I have a script that parses the filenames of TV episodes (show.name.s01e02.avi for example), grabs the episode name (from the www.thetvdb.com API) and automatically renames them into something nicer (Show Name - [01x02].avi) The script works fine, that is until you try and use it on files that have Unicode show-names (something I never really thought about, since all the files I have are English, so mostly pretty-much all fall within [a-zA-Z0-9'\-]) How can I allow the regular expressions to match accented characters and the likes? Currently the regex's config section looks like.. config['valid_filename_chars'] = """0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!@£$%^&*()_+=-[]{}"'.,<>`~? """ config['valid_filename_chars_regex'] = re.escape(config['valid_filename_chars']) config['name_parse'] = [ # foo_[s01]_[e01] re.compile('''^([%s]+?)[ \._\-]\[[Ss]([0-9]+?)\]_\[[Ee]([0-9]+?)\]?[^\\/]*$'''% (config['valid_filename_chars_regex'])), # foo.1x09* re.compile('''^([%s]+?)[ \._\-]\[?([0-9]+)x([0-9]+)[^\\/]*$''' % (config['valid_filename_chars_regex'])), # foo.s01.e01, foo.s01_e01 re.compile('''^([%s]+?)[ \._\-][Ss]([0-9]+)[\.\- ]?[Ee]([0-9]+)[^\\/]*$''' % (config['valid_filename_chars_regex'])), # foo.103* re.compile('''^([%s]+)[ \._\-]([0-9]{1})([0-9]{2})[\._ -][^\\/]*$''' % (config['valid_filename_chars_regex'])), # foo.0103* re.compile('''^([%s]+)[ \._\-]([0-9]{2})([0-9]{2,3})[\._ -][^\\/]*$''' % (config['valid_filename_chars_regex'])), ]

Read the article

Using Python simplejson for transmitting JSON to another server results in unicode encoding problems

- by Mark

Hi there, I'm encoding a string with Python's simplejson library with special characters: hello testing spécißl characters plusses: +++++ special chars :œ?´®†¥¨ˆøp“ß?ƒ©??°¬O˜çv?˜µ== However, when I encode it and transmit it to the other machine (using POST), it turns out like this: {'message': ['{"body": "hello testing sp\\u00e9ci\\u00dfl characters\\n\\nplusses: \\n\\nspecial chars :\\u0153\\u2211\\u00b4\\u00ae\\u2020\\u00a5\\u00a8\\u02c6\\u00f8\\u03c0\\u201c\\u00df\\u2202\\u0192\\u00a9\\u02d9\\u2206\\u02da\\u00ac\\u03a9\\u2248\\u00e7\\u221a\\u222b\\u02dc\\u00b5\\u2264\\u2265"}']} The + signs are completely stripped and the rest are in this unicode(?) format. My code for this is: data = {'body': data_string} data_encoded = json.dumps(data) Any ideas? Thanks! Edit: I've tried using json.dumps(data, ensure_ascii=False) but it results in a UnicodeError ordinal not in range error.

Read the article

How to diagnose, and reverse (not prevent) Unicode mangling

- by Steve Bennett

Somewhere upstream of me, "something" happened that looks like unicode mangling. One symptom is that a lowercase u umlaut (ü) gets converted to "Ã¼" (ie, character FC gets converted to C3 BC). Assuming that I have no control over this upstream process, how can I reverse-engineer what's going on? And if that is possible, can I crank the sausage machine backwards and get the original text back? (If it helps to understand this case, the text I received was in the form of a MySQL dump. I think somwewhere in the dump/transport process it got mangled.)

Read the article

c++ unicode writing is not working

- by Jugal Kishore

I am trying to write some Russian unicode text in file by wfstream. Following piece of code has been used for it. wfstream myfile; locale AvailLocale("Russian"); myfile.imbue(AvailLocale); myfile.open(L"d:\\example.txt",ios::out); if (myfile.is_open()) { myfile << L"?????? ????" <<endl; } myfile.flush(); myfile.close(); Something unrecognizable is written to the file by executing this code, I am using VS 2008.

Read the article

Using Unicode in fancyvrb’s VerbatimOut

- by Konrad Rudolph

Problem VerbatimOut from the “fancyvrb” package doesn’t play nicely with UTF-8 characters. Minimal working example: \documentclass{minimal} \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc} \usepackage{fancyvrb} \begin{document} \begin{VerbatimOut}{\jobname.test} é \end{VerbatimOut} \input{\jobname.test} \end{document} Error message When compiled using pdflatex mini, this gives the error File ended while scanning use of \UTFviii@three@octets. A different error occurs when the sole occurrence of é above is replaced by something else, e.g. é */: Package inputenc Error: Unicode char \u8:### not set up for use with LaTeX. – indicating that in this case, LaTeX succeeds in reading a multi-byte UTF-8 character, but not knowing what to do with it (i.e. it’s the wrong character). In fact, when I open the produced .test file manually, it contains the character é, but in Latin-1 encoding! Proof: when I open the files in a hex editor, I get the following: Original file: C3 A9 (corresponds to LATIN SMALL LETTER E WITH ACUTE in UTF-8) Written file: E9 (corresponds to é in Latin-1) Question How to set VerbatimOut up correctly? filecontents* (from “filecontents”) shows that it can work. Unfortunately, I don’t understand either code so I cannot fix fancyvrb’s code by replicating the logic from filecontents manually. I also cannot use filecontents* instead of VerbatimOut because the former doesn’t work within a \newenvironment, while the latter does. (Oh, by the way: vanilla Verbatim instead of VerbatimOut also works as expected. The error seems to occur when writing the file, not when reading the verbatim input)

Read the article

Normalizing (webdav) unicode paths

- by Evert

Hi guys, I'm working on a WebDAV implementation for PHP. In order to make it easier for Windows and other operating systems to work together, I need jump through some character encoding hoops. Windows uses ISO-8859-1 in it's HTTP request, while most other clients encode anything beyond ascii as UTF-8. My first approach was to ignore this altogether, but I quickly ran into issues when returning urls. I then figured it's probably best to normalize all urls. Using u¨ as an example. This will get sent over the wire by OS/X as u%CC%88 (this is codepoint U+0308) Windows sents this as: %FC (latin1) But, doing a utf8_encode on %FC, I get : %C3%BC (this is codepoint U+00FC) Should I treat %C3%BC and u%CC%88 as the same thing? If so.. how? Not touching it seems to work OK for windows. It somehow understands that it's a unicode character, but updating the same file throws an error (for no particular reason). I'd be happy to provide more information.

Read the article

Unicode strings in my C# App are shown with question marks

- by mrbamboo

Hi, I have a header file in C++/CLR project, which contains some strings in different languages. arabic, english, german, chinese, french, japanese etc... I have a second project written in C#. Here I access the strings stored in the header file of the C++/CLR project. The encoding of the header file is Unicode - Codepage 1200 or UTF-8. the visual studio editor is able to display the strings correctly. At runtime I access these strings and assign them into a local String variable. Here I recognized that many strings are not shown correctly. Doesn't matter if I assign them or not. Accessing the original place (while debugging) shows me all the foreign strings with question marks. Especially chinese, just question marks. Example : "So?e St?ange ?ext in Ch?n?se" (This is not the best example, I know) What is the problem? I read that C# is by default UTF-16, My header file containing the strings is UTF-16 or UTF-8. I must be able to handle strings in different languages. What am I doing wrong?

Read the article

Accessing Unicode telugu text from Ms-Access Database in Java

- by Ravi Chandra

I have an MS-Access database ( A English-telugu Dictionary database) which contains a table storing English words and telugu meanings. I am writing a dictionary program in Java which queries the database for a keyword entered by the user and display the telugu meaning. My Program is working fine till I get the data from the database, but when I displayed it on some component like JTextArea/JEditorPane/ etc... the telugu text is being displayed as '????'. Why is it happening?. I have seen the solution for "1467412/reading-unicode-data-from-an-access-database-using-jdbc" which provides some workaround for Hebrew language. But it is not working for telugu. i.e I included setCharset("UTF8")before querying from the database. but still I am getting all '?'s. As soon as I got data from Resultset I am checking the individual characters, all the telugu characters are being encoded by 63 only. This is my observation. I guess this must be some type of encoding problem. I would be very glad if somebody provides some solution for this. Thanks in advance.

Read the article

Python unicode problem

- by Somebody still uses you MS-DOS

I'm receiving some data from a ZODB (Zope Object Database). I receive a mybrains object. Then I do: o = mybrains.getObject() and I receive a "Person" object in my project. Then, I can do b = o.name and doing print b on my class I get: José Carlos and print b.name.__class__ <type 'unicode'> I have a lot of "Person" objects. They are added to a list. names = [o.nome, o1.nome, o2.nome] Then, I trying to create a text file with this data. delimiter = ';' all = delimiter.join(names) + '\n' No problem. Now, when I do a print all I have: José Carlos;Jonas;Natália Juan;John But when I try to create a file of it: f = open("/tmp/test.txt", "w") f.write(all) I get an error like this (the positions aren't exaclty the same, since I change the names) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 84: ordinal not in range(128) If I can print already with the "correct" form to display it, why I can't write a file with it? Which encode/decode method should I use to write a file with this data? I'm using Python 2.4.5 (can't upgrade it)

Read the article

GUI toolkit for Unicode text app?

- by wrp

In developing a tool for processing text in exotic scripts, I'm having trouble choosing a GUI toolkit. The main part of the interface is to be a text editor, not much more elaborate than Notepad, but with its own input method editor. It is to be extensible in a scripting language so that non-programmers can develop their own input methods and display routines. It will be assumed that all files are UTF-8. More elaborate support like regexes is not needed. The main sticking points are: characters beyond the Basic Multilingual Plane right-to-left and bi-directional text extension in a scripting language cross-platform Linux/Windows/OS X My first choice was Tcl/Tk, but it lacks bidi and going beyond the BMP seems dodgy. At the other extreme, I've considered Qt with embedded ECMAScript, but that might be heavier and less malleable than I would like. I'm even thinking about making it browser based, but I'm concerned that the IM for large scripts would be too heavy for client-side processing. I've also looked at a few similar projects in Java, but the quality of the font rendering in SWING has been unacceptable. What are your experiences in handling Unicode with various toolkits? Are there other serious issues I haven't considered? What would you recommend for doing this in the lightest way?

Read the article

Access 2007 and Special/Unicode Characters in SQL

- by blockcipher

I have a small Access 2007 database that I need to be able to import data from an existing spreadsheet and put it into our new relational model. For the most part this seems to work pretty well. Part of the process is attempting to see if a record already exists in a target table using SQL. For example, if I extract book information out of the current row in the spreadsheet, it may contain a title and abstract. I use SQL to get the ID of a matching record, if it exists. This works fine except when I have data that's in a non-English language. In this case, it seems that there is some punctuation that is causing me problems. At least I think it's punctuation as I do have some fields that do not have punctuation and are non-English that do not give me any problems. Is there a built-in function that can escape these characters? Currently I have a small function that will escape the single quote character, but that isn't enough. Or, is there a list of Unicode characters that can interfere with how SQL wants data quoted? Thanks in advance.

Read the article

Using unicodedata.normalize in Python 2.7

- by dpitch40

Once again, I am very confused with a unicode question. I can't figure out how to successfully use unicodedata.normalize to convert non-ASCII characters as expected. For instance, I want to convert the string u"Cœur" To u"Coeur" I am pretty sure that unicodedata.normalize is the way to do this, but I can't get it to work. It just leaves the string unchanged. >>> s = u"Cœur" >>> unicodedata.normalize('NFKD', s) == s True What am I doing wrong?

Read the article

Where can I find a useful Unicode fallback font for Mac OS X?

- by Stephen Jennings

On every browser I've tried (Firefox, Safari, Chrome, and Omniweb), when I go to a web page containing somewhat less-common characters, I can't see the glyphs. For example, on the Wikipedia page for the Bengali Language, the very first line contains a string of squares; on Windows, I can see the Bengali writing. On Windows, as long as I have the Arial Unicode MS font installed, these characters fall back to that font and display properly. Mac OS X doesn't seem to ship with a font containing these Unicode characters (it has Arial Unicode MS, but it must be a subset of the Windows version because Bengali doesn't display in that font). I checked on my Snow Leopard DVD and I installed "Additional Fonts" from the Optional Installs package, but I'm still missing many languages. Is there any good, free font that contains a large collection of languages? I know creating fonts is difficult and time-consuming, but it seems like including at least one font like this with operating systems should be standard by now.

Read the article

Detect Unicode Usage in SQL Column

One optimization you can make to a SQL table that is overly large is to change from nvarchar (or nchar) to varchar (or char). Doing so will cut the size used by the data in half, from 2 bytes per character (+ 2 bytes of overhead for varchar) to only 1 byte per character. However, you will lose the ability to store Unicode characters, such as those used by many non-English alphabets. If the tables are storing user-input, and your application is or might one day be used internationally, its likely that using Unicode for your characters is a good thing. However, if instead the data is being generated by your application itself or your development team (such as lookup data), and you can be certain that Unicode character sets are not required, then switching such columns to varchar/char can be an easy improvement to make. Avoid Premature Optimization If you are working with a lookup table that has a small number of rows, and is only ever referenced in the application by its numeric ID column, then you wont see any benefit to using varchar vs. nvarchar. More generally, for small tables, you wont see any significant benefit. Thus, if you have a general policy in place to use nvarchar/nchar because it offers more flexibility, do not take this post as a recommendation to go against this policy anywhere you can. You really only want to act on measurable evidence that suggests that using Unicode is resulting in a problem, and that you wont lose anything by switching to varchar/char. Obviously the main reason to make this change is to reduce the amount of space required by each row. This in turn affects how many rows SQL Server can page through at a time, and can also impact index size and how much disk I/O is required to respond to queries, etc. If for example you have a table with 100 million records in it and this table has a column of type nchar(5), this column will use 5 * 2 = 10 bytes per row, and with 100M rows that works out to 10 bytes * 100 million = 1000 MBytes or 1GB. If it turns out that this column only ever stores ASCII characters, then changing it to char(5) would reduce this to 5*1 = 5 bytes per row, and only 500MB. Of course, if it turns out that it only ever stores the values true and false then you could go further and replace it with a bit data type which uses only 1 byte per row (100MB total). Detecting Whether Unicode Is In Use So by now you think that you have a problem and that it might be alleviated by switching some columns from nvarchar/nchar to varchar/char but youre not sure whether youre currently using Unicode in these columns. By definition, you should only be thinking about this for a column that has a lot of rows in it, since the benefits just arent there for a small table, so you cant just eyeball it and look for any non-ASCII characters. Instead, you need a query. Its actually very simple: SELECT DISTINCT(CategoryName)FROM CategoriesWHERE CategoryName <> CONVERT(varchar, CategoryName) Summary Gregg Stark for the tip. Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article

How do you normalize one-to-one-or-the-other relationships?

- by Jekke

I'm storing data on baseball statistics and would like to do so with three tables: players, battingStats, and pitchingStats. For the purpose of the question, each player will have batting stats or pitching stats, but not both. How would I normalize such a relationship in 3NF?

Read the article

database is normalized but the following is a problem please help

- by user287745

but the prob is there are relations ships which are so huge that after normalizing they have like a 20 primary keys( composite keys) which are really foreign keys but have to be declared as primary keys to identify the relationship uniquely. so please help? is it correct and i apologize to the expert community for not accepting answers, i was not aware that accepting is possible, the TICK MARK is that visible :-)

Read the article

How to get unicodes from Google translation output string.

- by user270885

In google translate web site if i type any word in English and select any other foreign language, it show the exact word in the foreign language. I want the unicode value of that foreign characters. How to get that?

Read the article

Can I turn off implicit Python unicode conversions to find my mixed-strings bugs?

- by Tal Weiss

When profiling our code I was surprised to find millions of calls to C:\Python26\lib\encodings\utf_8.py:15(decode) I started debugging and found that across our code base there are many small bugs, usually comparing a string to a unicode or adding a sting and a unicode. Python graciously decodes the strings and performs the following operations in unicode. How kind. But expensive! I am fluent in unicode, having read Joel Spolsky and Dive Into Python... I try to keep our code internals in unicode only. My question - can I turn off this pythonic nice-guy behavior? At least until I find all these bugs and fix them (usually by adding a u'u')? Some of them are extremely hard to find (a variable that is sometimes a string...). Python 2.6.5 (and I can't switch to 3.x).

Read the article

Python unicode Decode Error SUDs

- by PylonsN00b

OK so I have # -*- coding: utf-8 -*- at the top of my script and it worked for being able to pull data from the database that had funny chars(Ñ ,Õ,é,—,–,’,…) in it and store that data into variables...but I have run into other problems, see I pull my data, organize it, and then dump it into a variables like so: title = product[1] Where product[1] is from my database result set Then I load it up for Suds like so: array_of_inventory_item_submit = ca_client_inventory.factory.create('ArrayOfInventoryItemSubmit') for product in products: inventory_item_submit = ca_client_inventory.factory.create('InventoryItemSubmit') inventory_item_list = get_item_list(product) inventory_item_submit = [inventory_item_list] array_of_inventory_item_submit.InventoryItemSubmit.append(inventory_item_submit) #Call that service baby! ca_client_inventory.service.SynchInventoryItemList(accountID, array_of_inventory_item_submit) Where get_item_list sets product[1] to title and (including a whole bunch of other nodes): inventory_item_submit.Title = title So everything runs fine until I call ca_client_inventory.service.SynchInventoryItemList that contains array_of_inventory_item_submit which contains the title w/ the funky char...here is the error: Traceback (most recent call last): File "upload_all_inventory_ebay.py", line 421, in <module> ca_client_inventory.service.SynchInventoryItemList(accountID, array_of_inventory_item_submit) File "build/bdist.macosx-10.6-i386/egg/suds/client.py", line 539, in __call__ File "build/bdist.macosx-10.6-i386/egg/suds/client.py", line 592, in invoke File "build/bdist.macosx-10.6-i386/egg/suds/bindings/binding.py", line 118, in get_message File "build/bdist.macosx-10.6-i386/egg/suds/bindings/document.py", line 63, in bodycontent File "build/bdist.macosx-10.6-i386/egg/suds/bindings/document.py", line 105, in mkparam File "build/bdist.macosx-10.6-i386/egg/suds/bindings/binding.py", line 260, in mkparam File "build/bdist.macosx-10.6-i386/egg/suds/mx/core.py", line 62, in process File "build/bdist.macosx-10.6-i386/egg/suds/mx/core.py", line 75, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/appender.py", line 102, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/appender.py", line 243, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/appender.py", line 182, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/core.py", line 75, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/appender.py", line 102, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/appender.py", line 298, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/appender.py", line 182, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/core.py", line 75, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/appender.py", line 102, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/appender.py", line 298, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/appender.py", line 182, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/core.py", line 75, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/appender.py", line 102, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/appender.py", line 243, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/appender.py", line 182, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/core.py", line 75, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/appender.py", line 102, in append File "build/bdist.macosx-10.6-i386/egg/suds/mx/appender.py", line 198, in append File "build/bdist.macosx-10.6-i386/egg/suds/sax/element.py", line 251, in setText File "build/bdist.macosx-10.6-i386/egg/suds/sax/text.py", line 43, in __new__ UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 116: ordinal not in range(128) Now what? My guess is my script can take in these funky chars because I have # -*- coding: utf-8 -*- at the top but Suds does NOT have that at the top of its files. Do I really want to go and change the Suds files...we all know this is the least desired last possible solution...what can I do?

Read the article

PHP function to convert unicode to special characters?

- by inktri

Is there a php function to handle the encodings below? .replaceAll("\u00c3\u0080", "À") .replaceAll("\u00c3\u0081", "Á") .replaceAll("\u00c3\u0082", "Â") .replaceAll("\u00c3\u0083", "Ã") .replaceAll("\u00c3\u0084", "Ä") .replaceAll("\u00c3\u0085", "Å") .replaceAll("\u00c3\u0086", "Æ") .replaceAll("\u00c3\u00a0", "à") .replaceAll("\u00c3\u00a1", "á") .replaceAll("\u00c3\u00a2", "â") .replaceAll("\u00c3\u00a3", "ã") .replaceAll("\u00c3\u00a4", "ä") .replaceAll("\u00c3\u00a5", "å") .replaceAll("\u00c3\u00a6", "æ") .replaceAll("\u00c3\u0087", "Ç") .replaceAll("\u00c3\u00a7", "ç") .replaceAll("\u00c3\u0090", "Ð") .replaceAll("\u00c3\u00b0", "ð") .replaceAll("\u00c3\u0088", "È") .replaceAll("\u00c3\u0089", "É") .replaceAll("\u00c3\u008a", "Ê") .replaceAll("\u00c3\u008b", "Ë") .replaceAll("\u00c3\u00a8", "è") .replaceAll("\u00c3\u00a9", "é") .replaceAll("\u00c3\u00aa", "ê") .replaceAll("\u00c3\u00ab", "ë") .replaceAll("\u00c3\u008c", "Ì") .replaceAll("\u00c3\u008d", "Í") .replaceAll("\u00c3\u008e", "Î") .replaceAll("\u00c3\u008f", "Ï") .replaceAll("\u00c3\u00ac", "ì") .replaceAll("\u00c3\u00ad", "í") .replaceAll("\u00c3\u00ae", "î") .replaceAll("\u00c3\u00af", "ï") .replaceAll("\u00c3\u0091", "Ñ") .replaceAll("\u00c3\u00b1", "ñ") .replaceAll("\u00c3\u0092", "Ò") .replaceAll("\u00c3\u0093", "Ó") .replaceAll("\u00c3\u0094", "Ô") .replaceAll("\u00c3\u0095", "Õ") .replaceAll("\u00c3\u0096", "Ö") .replaceAll("\u00c3\u0098", "Ø") .replaceAll("\u00c5\u0092", "&OElig;") .replaceAll("\u00c3\u00b2", "ò") .replaceAll("\u00c3\u00b3", "ó") .replaceAll("\u00c3\u00b4", "ô") .replaceAll("\u00c3\u00b5", "õ") .replaceAll("\u00c3\u00b6", "ö") .replaceAll("\u00c3\u00b8", "ø") .replaceAll("\u00c5\u0093", "&oelig;") .replaceAll("\u00c3\u0099", "Ù") .replaceAll("\u00c3\u009a", "Ú") .replaceAll("\u00c3\u009b", "Û") .replaceAll("\u00c3\u009c", "Ü") .replaceAll("\u00c3\u00b9", "ù") .replaceAll("\u00c3\u00ba", "ú") .replaceAll("\u00c3\u00bb", "û") .replaceAll("\u00c3\u00bc", "ü") .replaceAll("\u00c3\u009d", "Ý") .replaceAll("\u00c5\u00b8", "&Yuml;") .replaceAll("\u00c3\u00bd", "ý") .replaceAll("\u00c3\u00bf", "ÿ");

Read the article

Could someone please explain why python prints unicode characters in this instance

- by mike

From the Python 2.6 shell: >>> import sys >>> print sys.getdefaultencoding() ascii >>> print u'\xe9' é >>> I expected to have either some gibberish or an Error after the print statement, since the "é" character isn't part of ASCII and I haven't specified an encoding. I guess I don't understand what ASCII being the default encoding means.

Read the article

Django: Unicode Filenames with ASCII headers?

- by TheLizardKing

I have a list of strangely encoded files: 02 - Charlie, Woody and You/Study #22.mp3 which I suppose isn't so bad but there are a few particular characters which Django OR nginx seem to be snagging on. >>> test = u'02 - Charlie, Woody and You/Study #22.mp3' >>> test u'02 - Charlie, Woody and You\uff0fStudy #22.mp3' I am using nginx as a reverse proxy to connect to django's built in webserver (still in development stages) and postgresql for my database. My database and tables are all en_US.UTF-8 and I am using pgadmin3 to view my tables outside of django. My issue goes a little beyond my title, firstly how should I be saving possibly whacky filenames in my database? My current method is 'path': smart_unicode(path.lstrip(MUSIC_PATH)), 'filename': smart_unicode(file) and when I pprint out the values they do show u'whateverthecrap' I am not sure if that is how I should be doing it but assuming it is now I have issues trying to spit out the download. My download view looks something like this: def song_download(request, song_id): song = get_object_or_404(Song, pk=song_id) url = u'/static_music/%s/%s' % (song.path, song.filename) print url response = HttpResponse() response['X-Accel-Redirect'] = url response['Content-Type'] = 'audio/mpeg' response['Content-Disposition'] = "attachment; filename=test.mp3" return response and most files will download but when I get to 02 - Charlie, Woody and You/Study #22.mp3 I receive this from django: 'ascii' codec can't encode character u'\uff0f' in position 118: ordinal not in range(128), HTTP response headers must be in US-ASCII format. How can I use an ASCII acceptable string if my filename is out of bounds? 02 - Charlie, Woody and You\uff0fStudy #22.mp3 doesn't seem to work... EDIT 1 I am using Ubuntu for my OS.

Read the article

delphi 2010 variant to unicode problem

- by Crudler

Please advise how I can achieve this. I am working in a dll in delphi 2010. This dll has a exported procedure that receives an array of variants. I want to be able to take one of these variants, and convert it into a string, but i keep getting ????? I cannot change the input variable - it HAS to be an array of variants. The host app that calls the dll cannot be changed. It is written in Delphi2006. sample dll's code is: Procedure TestArr(ArrUID : array of variant);stdcall; var i : integer; s:string; begin s:= string(String(Arruid[0])); showmessage(s); end; obviously in D2006 my dll works fine. I have tried using VartoStr - no luck. When I try test the VaType I am getting a varString Any suggestions?

Search Results

Search found 1649 results on 66 pages for 'unicode normalization'.

Page 8/66 | < Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15 | Next Page >

- by chathuradd

- by Dan

- by dbr

- by Mark

- by Steve Bennett

- by Jugal Kishore

- by Konrad Rudolph

- by Evert

- by mrbamboo

- by Ravi Chandra

- by Somebody still uses you MS-DOS

- by wrp

- by blockcipher

- by dpitch40

- by Stephen Jennings

- by Jekke

- by user287745

- by user270885

- by Tal Weiss

- by PylonsN00b

- by inktri

- by mike

- by TheLizardKing

- by Crudler

< Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15 | Next Page >