Search Results

Search found 1649 results on 66 pages for 'unicode normalization'.

Page 17/66 | < Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24  | Next Page >

  • Dealing with wacky encodings in Python

    - by Tyson
    I have a Python script that pulls in data from many sources (databases, files, etc.). Supposedly, all the strings are unicode, but what I end up getting is any variation on the following theme (as returned by repr()): u'D\\xc3\\xa9cor' u'D\xc3\xa9cor' 'D\\xc3\\xa9cor' 'D\xc3\xa9cor' Is there a reliable way to take any four of the above strings and return the proper unicode string? u'D\xe9cor' # --> Décor The only way I can think of right now uses eval(), replace(), and a deep, burning shame that will never wash away.

    Read the article

  • How can you print a string using raw_unicode_escape encoding in python 3?

    - by Sorin Sbarnea
    The following code with fail in Python 3.x with TypeError: must be str, not bytes because now encode() returns bytes and print() expects only str. #!/usr/bin/python from __future__ import print_function str2 = "some unicode text" print(str2.encode('raw_unicode_escape')) How can you print a Unicode string escaped representation using print()? I'm looking for a solution that will work with Python 2.6 or newer, including 3.x

    Read the article

  • How web server choose between unicode and utf-8 for accentued characters?

    - by jacques
    I have a web server with my ISP which replaces in the urls the accentued characters by their unicode values: for instance é (eacute) is translated to %e9 (dec 233). For testing I use locally Easyphp which translate those characters by their utf-8 equivalence: é is then replaced by the well known sequence %c3%a9 (é)... Browsers served by Easyphp don't decode unicode values but they do if running locally (utf-8 and non converted accent also)... I have been unable to find where this behavior is configured in the server. This is a problem as some urls are built by my application using the php rawurlencode() which seems to always encode with unicode values on both servers. Any idea? Thanks in advance.

    Read the article

  • How does the web server choose between unicode and utf-8 for accented characters?

    - by jacques
    I have a web server with my ISP which replaces accented characters in URLs with their unicode values: for instance é (eacute) is translated to %e9 (dec 233). For testing locally I use EasyPHP which translates those characters by their utf-8 equivalence: é is then replaced by the well known sequence %c3%a9 (é)... Browsers served by EasyPHP don't decode unicode values but they do if running locally (utf-8 and non converted accent also)... I have been unable to find where this behavior is configured in the server. This is a problem as some urls are built by my application using the php rawurlencode() which seems to always encode with unicode values on both servers. Any idea?

    Read the article

  • how to mod rewrite unicode byte sequence for the multibyte hyphen character

    - by ChickenFur
    We have case where some adobe pdf files format the hyphen character as %E2%80%90. See http://forums.adobe.com/message/2807241 this is caused by the Calibri font I guess. So these pdf files have been released and the links don't work So I thought mod rewrite would come to the rescue. I followed this post here mod_ReWrite to remove part of a URL but I can't seem to search for the % characters according to this question. Is there anything else I can do? Here is the rewrite rule I want to use: RewriteRule ^foo%(.+)bar /foo-bar [L,R=301] I also tried this and it doesn't work RewriteRule ^foo%E2%80%90bar /foo-bar [L,R=301] Any Ideas?

    Read the article

  • How do I detect unicode characters in a Java string to resolve sax parser exception

    - by Madhumita
    Suppose I have a string that contains '¿'. How would I find all those unicode characters? Should I test for their code? How would I do that? I want to detect it to avoid sax parser exception which I am getting it while parsing the xml saved as a clob in oracle 10g database. Exception javax.servlet.ServletException: org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.

    Read the article

  • Looking for a good Delphi unicode string library

    - by volvox
    Any suggestion for a good unicode string library for Delphi 2010? Such thing as class that would contain a collection of independent functions, basically an encapsulation of functions that manipulate strings (ex: Trimlike, Character removal, Positional, Sub-string, Compare, Informational, Case, Replacement, Manipulation functions etc. ). Thanks

    Read the article

  • any practices ,samples for ERD?

    - by just_name
    Q: I wanna any web sites , any books just for training on ERD and normalization ,, i wanna a lot of samples ,practices,and case studies with recommended answers, to strength myself in database design.and avoid the poor data base design i made . note:i don't need books to explain the concepts , what i need is practices ,examples,case studies with recommended answers. Thanks in advance.

    Read the article

  • Problem initialing a unicode string

    - by Simon
    Hey All. Atm im working with native API calls and i have to get RtlInitUnicodeString to work. The way i use: const WCHAR wcMutex[] = L"String1"; UNICODE_STRING unicodeMutexBuffer; RtlInitUnicodeString(&unicodeMutexBuffer,wcMutex); now my problem the project doesnt compile , i get this error: Error argument of type "UNICODE_STRING*" is incompatible with type of "PUNICODE_STRING" but in my old Driver kit , i used same way to initialize the unicode string struct

    Read the article

  • Display ñ on a C# .NET application

    - by mmr
    I have a localization issue. One of my industrious coworkers has replaced all the strings throughout our application with constants that are contained in a dictionary. That dictionary gets various strings placed in it once the user selects a language (English by default, but target languages are German, Spanish, French, Portuguese, Mandarin, and Thai). For our test of this functionality, we wanted to change a button to include text which has a ñ character, which appears both in Spanish and in the Arial Unicode MS font (which we're using throughout the application). Problem is, the ñ is appearing as a square block, as if the program did not know how to display it. When I debug into that particular string being read from disk, the debugger reports that character as a square block as well. So where is the failure? I think it could be in a few places: 1) Notepad may not be unicode aware, so the ñ displayed there is not the same as what vs2008 expects, and so the program interprets the character as a square (EDIT: notepad shows the same characters as vs; ie, they both show the ñ. In the same place.). 2) vs2008 can't handle ñ. I find that very, very hard to believe. 3) The text is read in properly, but the default font for vs2008 can't display it, which is why the debugger shows a square. 4) The text is not read in properly, and I should use something other than a regular StreamReader to get strings. 5) The text is read in properly, but the default String class in C# doesn't handle ñ well. I find that very, very hard to believe. 6) The version of Arial Unicode MS I have doesn't have ñ, despite it being listed as one of the 50k characters by http://www.fileinfo.info. Anything else I could have left out? Thanks for any help!

    Read the article

  • How to set UCS2 in numpy?

    - by mindcorrosive
    I'm trying to build numpy 1.2.1 as a module for a third-party python interpreter (custom-built, py2.4 linux x86_64) so that I can make calls to numpy from within it. Let's call this one interpreter A. The thing is, the system-wide python interpreter (also py2.4, let's call it B) from the vendor is built with --enable-unicode=ucs4, while the custom one is with UCS2. Needless to say, when I try to build a module with B, I get an error when I try to import numpy in A -- it complains about undefined symbol _PyUnicodeUCS4_IsWhiteSpace. I've searched around and apparently there's no way around this but to compile a custom Python interpreter -- which I did (let's call it interpreter C), properly specifying the unicode string length (verifiable through sys.maxunicode). I managed to build numpy with C as well, surprisingly enough, but still the problem persists when I try to import it in interpreter C. Previously, when I built numpy using B, there were no problems when importing it in B, but A would complain. Perhaps there's an option when building numpy to specify the length of Unicode strings to be used, as when configuring Python builds? Or am I doing something else wrong? A few notes: Upgrading to newer versions of python and/or numpy is not an option - interpreter A will stay on this version of the grammar for the foreseeable future. Also, it is not possible to start the interpreter A in standalone mode to build numpy with it, as it needs some other libraries preloaded I know that this whole thing is a mess, but I'd appreciate any help I can get to make this work. If you need more information, please let me know, I'd be happy to oblige. Thanks to everybody for their time in advance.

    Read the article

  • Should I strip the XML declaration from suds output before parsing with lxml?

    - by mikl
    I’m trying to implement a SOAP webservice in Python 2.6 using the suds library. That is working well, but I’ve run into a problem when trying to parse the output with lxml. Suds returns a suds.sax.text.Text object with the reply from the SOAP service. The suds.sax.text.Text class is a subclass of the Python built-in Unicode class. In essence, it would be comparable with this Python statement: u'<?xml version="1.0" encoding="utf-8" ?><root><lotsofelements \></root>' Which is incongrous, since if the XML declaration is correct, the contents are UTF-8 encoded, and thus not a Python Unicode object (because those are stored in some internal encoding like UCS4). lxml will refuse to parse this, as documented, since there is no clear answer to what encoding it should be interpreted as. As I see it, there are two ways out of this bind: Strip the <?xml> declaration, including the encoding. Convert the output from Suds into a bytestring, using the specified encoding. Currently, the data I’m receiving from the webservice is within the ASCII-range, so either way will work, but both feels very much like ugly hacks to me, and I’m not quite sure what would happen, if I start to receive data that would need a wider range of Unicode characters. Any good ideas? I can’t imagine I’m the first one in this position…

    Read the article

  • Weird symbols on Mac

    - by Rich Bradshaw
    Since I've had my mac, I keep seeing this weird symbol. Till today, it had been only in the place of bullet points in OpenOffice.org. The first pictures shows this in a .doc file created on a Windows system. I thought nothing of it - just an annoyance. It appears no matter what the font. Real bullets appear if I delete the text and insert a bulletted list using the toolbar. Then, today I noticed in in iTunes - which seemed strange. Image 3 is a zoom of the character. It says on it: Private Use E000 F8FF. What is it (unicode related?), and how do I get the bullets working properly? Edit: The plot thickens... If I boot in Safe Mode, the symbols look like little snap boards like you'd have at the beginning of filming a scene in a film...

    Read the article

  • Is there a MySQL performance benchmark to measure the impact of utf8_unicode_ci versus utf8_general_ci?

    - by MiniQuark
    I read here and there that using the utf8_unicode_ci collation ensures a better treatment of unicode text (for example, it knowns how to expand characters such as 'œ' into 'oe' for searching and ordering) compared to the default utf8_general_ci which basically just strips diacritics. Unfortunately, both sources indicate that utf8_unicode_ci is slightly slower than utf8_general_ci. So my question is: what does "slightly slower" mean? Has anyone run benchmarks? Are we talking about a -0.01% performance impact or rather something like -25%? Thanks for your help.

    Read the article

  • How to enter Devanagari half-R character in Emacs MULE?

    - by journeyer
    I am trying the Emacs MULE ITRANS input method to enter Devanagari unicode text. I am looking to enter a key sequence for Devanagari letter "Half-R" (??) i.e. U0931 U094d (which should be mapped to R according to the ITRANS Wikipedia page). While all other keys in the map work fine, this particular one does not! I know I can use M-x ucs-insert (or CTRL-x-8 RET) to enter this sequence, but it is getting tiresome. How do I fix this problem ?

    Read the article

< Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24  | Next Page >