Search Results

Search found 1649 results on 66 pages for 'unicode normalization'.

Page 28/66 | < Previous Page | 24 25 26 27 28 29 30 31 32 33 34 35 | Next Page >

Encoding gives "'ascii' codec can't encode character … ordinal not in range(128)"

- by user140314

I am working through the Django RSS reader project here. The RSS feed will read something like "OKLAHOMA CITY (AP) — James Harden let". The RSS feed's encoding reads encoding="UTF-8" so I believe I am passing utf-8 to markdown in the code snippet below. The em dash is where it chokes. I get the Django error of "'ascii' codec can't encode character u'\u2014' in position 109: ordinal not in range(128)" which is an UnicodeEncodeError. In the variables being passed I see "OKLAHOMA CITY (AP) \u2014 James Harden". The code line that is not working is: content = content.encode(parsed_feed.encoding, "xmlcharrefreplace") I am using markdown 2.0, django 1.1, and python 2.4. What is the magic sequence of encoding and decoding that I need to do to make this work? Thanks.

Read the article
How to enable reading non-ascii characters in Servlets

- by Daziplqa

How to make the servlet accept non-ascii (Arabian, chines, etc) characters passed from JSPs? I've tried to add the following to top of JSPs: <%@page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%> And to add the following in each post/get method in the servlet: request.setCharacterEncoding("UTF-8"); response.setCharacterEncoding("UTF-8"); I've tried to add a Filter that executes the above two statements instead of in the servlet. To be quite honest, these was working in the past, but now it doesn't work anymore. I am using tomcat 5.0.28/6.x.x on JDK1.6 on both Win & Linux boxes.

Read the article
Send javax.mail.internet.MimeMessage to a recipient with non-ASCII name?

- by phyzome

I am writing a piece of Java code that needs to send mail to users with non-ASCII names. I have figured out how to use UTF-8 for the body, subject line, and generic headers, but I am still stuck on the recipients. Here's what I'd like in the "To:" field: "????????????" <[email protected]>. This lives (for our purposes today) in a String called recip. msg.addRecipients(MimeMessage.RecipientType.TO, recip) gives "?????S]" <[email protected]> msg.addHeader("To", MimeUtility.encodeText(recip, "utf-8", "B")) throws AddressException: Local address contains control or whitespace in string ``=?utf-8?B?IuOCpuOCo+OCreODmuODh+OCo+OCouOBq+OCiOOBhuOBk+OBnSIgPA==?= =?utf-8?B?Zm9vQGV4YW1wbGUuY29tPg==?='' How the heck am I supposed to send this message?

Read the article
Using C# to detect whether a filename character is considered international

- by Morten Mertner

I've written a small console application (source below) to locate and optionally rename files containing international characters, as they are a source of constant pain with most source control systems (some background on this below). The code I'm using has a simple dictionary with characters to look for and replace (and nukes every other character that uses more than one byte of storage), but it feels very hackish. What's the right way to (a) find out whether a character is international? and (b) what the best ASCII substitution character would be? Let me provide some background information on why this is needed. It so happens that the danish Å character has two different encodings in UTF-8, both representing the same symbol. These are known as NFC and NFD encodings. Windows and Linux will create NFC encoding by default but respect whatever encoding it is given. Mac will convert all names (when saving to a HFS+ partition) to NFD and therefore returns a different byte stream for the name of a file created on Windows. This effectively breaks Subversion, Git and lots of other utilities that don't care to properly handle this scenario. I'm currently evaluating Mercurial, which turns out to be even worse at handling international characters.. being fairly tired of these problems, either source control or the international character would have to go, and so here we are. My current implementation: public class Checker { private Dictionary<char, string> internationals = new Dictionary<char, string>(); private List<char> keep = new List<char>(); private List<char> seen = new List<char>(); public Checker() { internationals.Add( 'æ', "ae" ); internationals.Add( 'ø', "oe" ); internationals.Add( 'å', "aa" ); internationals.Add( 'Æ', "Ae" ); internationals.Add( 'Ø', "Oe" ); internationals.Add( 'Å', "Aa" ); internationals.Add( 'ö', "o" ); internationals.Add( 'ü', "u" ); internationals.Add( 'ä', "a" ); internationals.Add( 'é', "e" ); internationals.Add( 'è', "e" ); internationals.Add( 'ê', "e" ); internationals.Add( '¦', "" ); internationals.Add( 'Ã', "" ); internationals.Add( '©', "" ); internationals.Add( ' ', "" ); internationals.Add( '§', "" ); internationals.Add( '¡', "" ); internationals.Add( '³', "" ); internationals.Add( '', "" ); internationals.Add( 'º', "" ); internationals.Add( '«', "-" ); internationals.Add( '»', "-" ); internationals.Add( '´', "'" ); internationals.Add( '`', "'" ); internationals.Add( '"', "'" ); internationals.Add( Encoding.UTF8.GetString( new byte[] { 226, 128, 147 } )[ 0 ], "-" ); internationals.Add( Encoding.UTF8.GetString( new byte[] { 226, 128, 148 } )[ 0 ], "-" ); internationals.Add( Encoding.UTF8.GetString( new byte[] { 226, 128, 153 } )[ 0 ], "'" ); internationals.Add( Encoding.UTF8.GetString( new byte[] { 226, 128, 166 } )[ 0 ], "." ); keep.Add( '-' ); keep.Add( '=' ); keep.Add( '\'' ); keep.Add( '.' ); } public bool IsInternationalCharacter( char c ) { var s = c.ToString(); byte[] bytes = Encoding.UTF8.GetBytes( s ); if( bytes.Length > 1 && ! internationals.ContainsKey( c ) && ! seen.Contains( c ) ) { Console.WriteLine( "X '{0}' ({1})", c, string.Join( ",", bytes ) ); seen.Add( c ); if( ! keep.Contains( c ) ) { internationals[ c ] = ""; } } return internationals.ContainsKey( c ); } public bool HasInternationalCharactersInName( string name, out string safeName ) { StringBuilder sb = new StringBuilder(); Array.ForEach( name.ToCharArray(), c => sb.Append( IsInternationalCharacter( c ) ? internationals[ c ] : c.ToString() ) ); int length = sb.Length; sb.Replace( " ", " " ); while( sb.Length != length ) { sb.Replace( " ", " " ); } safeName = sb.ToString().Trim(); string namePart = Path.GetFileNameWithoutExtension( safeName ); if( namePart.EndsWith( "." ) ) safeName = namePart.Substring( 0, namePart.Length - 1 ) + Path.GetExtension( safeName ); return name != safeName; } } And this would be invoked like this: FileInfo file = new File( "Århus.txt" ); string safeName; if( checker.HasInternationalCharactersInName( file.Name, out safeName ) ) { // rename file }

Read the article
Why UTF-32 instead of UTF-16 if we have surrogate pairs?

- by zildjohn01

If I understand correctly, UTF-32 can handle every character in the universe. So can UTF-16, through the use of surrogate pairs. So is there any good reason to use UTF-32 instead of UTF-16?

Read the article
How I can print the wchar_t values to console?

- by zed91

Example: #include <iostream> using namespace std; int main() { wchar_t en[] = L"Hello"; wchar_t ru[] = L"??????"; //Russian language cout << ru << endl << en; return 0; } This code only prints HEX-values like adress. How to print the wchar_t string?

Read the article
help me with xor encryption in c#

- by x86shadow

I wrote this code in c# to encrypt a text with a key : using System; using System.Linq; using System.Collections.Generic; using System.Text; namespace ENCRYPT { class XORENC { private static int Bin2Dec(string num) { int _num = 0; for (int i = 0; i < num.Length; i++) { _num += (int)Math.Pow(2, num.Length - i - 1) * int.Parse(num[i].ToString()); } return _num; } private static string Dec2Bin(int num) { if (num < 2) return num.ToString(); return Dec2Bin(num / 2) + (num % 2).ToString(); } public static string StrXor(string str, string key) { string _str = ""; string _key = ""; string _dec = ""; string _temp = ""; for (int i = 0; i < str.Length; i++) { _temp = Dec2Bin(str[i]); for (int j = 0; j < 8 - _temp.Length + 1; j++) { _temp = '0' + _temp; } _str += _temp; } for (int i = 0; i < key.Length; i++) { _temp = Dec2Bin(key[i]); for (int j = 0; j < 8 - _temp.Length + 1; j++) { _temp = '0' + _temp; } _key += _temp; } while (_key.Length < _str.Length) { _key += _key; } if (_key.Length > _str.Length) _key = _key.Substring(0, _str.Length); for (int i = 0; i < _str.Length; i++) { if (_str[i] == _key[i]) { _dec += '0'; } else { _dec += '1'; } } _str = ""; for (int i = 0; i < _dec.Length; i = i + 8) { char _chr = (char)0; _chr = (char)Bin2Dec(_dec.Substring(i, 8)); _str += _chr; } return _str; } } } the problem is that I always get error when I want to decrypt an encryted text with this code. see the example below for more info : string enc_text = ENCRYPT.XORENC("abc","a"); //enc_text = " ??" string dec_text = ENCRYPT.XORENC(enc_text,"a"); //ERROR any one can help ?

Read the article
python input UnicodeDecodeError:

- by The man on the Clapham omnibus

python 3.x >>> a = input() hope >>> a 'hope' >>> b = input() håpe >>> b 'håpe' >>> c = input() start typing hå... delete using backspace... and change to hope Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 1: invalid continuation byte >>> The situation is not terrible, I am working around it, but find it strange that when deleting, the bytes get messed up. Has anyone else experienced this? the terminal history shows that I thought that I entered h?ope any ideas? in the script that is using this, I do import readline to give command line history.

Read the article
What's the deal with char.GetNumericValue?

- by mgroves

I was working on Project Euler 40, and was a bit bothered that there was no int.Parse(char). Not a big deal, but I did some asking around and someone suggested char.GetNumericValue. GetNumericValue seems like a very odd method to me: Takes in a char as a parameter and returns...a double? Returns -1.0 if the char is not '0' through '9' So what's the reasoning behind this method, and what purpose does returning a double serve? I even fired up Reflector and looked at InternalGetNumericValue, but it's just like watching Lost: every answer just leads to another question.

Read the article
How can I get Perl to detect the bad UTF-8 sequences?

- by gorilla

I'm running Perl 5.10.0 and Postgres 8.4.3, and strings into a database, which is behind a DBIx::Class. These strings should be in UTF-8, and therefore my database is running in UTF-8. Unfortunatly some of these strings are bad, containing malformed UTF-8, so when I run it I'm getting an exception DBI Exception: DBD::Pg::st execute failed: ERROR: invalid byte sequence for encoding "UTF8": 0xb5 I thought that I could simply ignore the invalid ones, and worry about the malformed UTF-8 later, so using this code, it should flag and ignore the bad titles. if(not utf8::valid($title)){ $title="Invalid UTF-8"; } $data->title($title); $data->update(); However Perl seems to think that the strings are valid, but it still throws the exceptions. How can I get Perl to detect the bad UTF-8?

Read the article
jQuery :contains(unicode_characters)

- by SeanJA

I have an element like this: <span class="tool_tip" title="The full title">The ful…</span> This seems to work: jQuery('span:contains(…)'); But this does not: jQuery('span:contains(…)'); I am pretty sure that it would be bad to use the first one because if someone else saves the file, or the browser decides to get the file in a different character set for some reason things will not work. There has to be a way to properly select this span, right?

Read the article
How Do I grep For non-ASCII Characters in UNIX

- by Peter Conrey

I have several very large XML files and I'm trying to find the lines that contain non-ASCII characters. I've tried the following: grep -e "[\x{00FF}-\x{FFFF}]" file.xml But this returns every line in the file, regardless of whether the line contains a character in the range specified. Do I have the syntax wrong or am I doing something else wrong? I've also tried: egrep "[\x{00FF}-\x{FFFF}]" file.xml (with both single and double quotes surrounding the pattern).

Read the article
PHP detecting filesystem encoding

- by Evert

Hi guys, I need to save files with non-latin filenames on a filesytem, using PHP. I want to make this work cross-platform. How do I know what encoding I can use to write the file? I understand many modern filesystems are UTF-8 based (is this correct?), but I doubt Windows XP is (for instance). So, is there a robust detection mechanism? Evert

Read the article
Java, JavaCC: How to check if a char (or char pair) is inside a given UTF32 range?

- by java.is.for.desktop

Hello, everyone! I am referring to the XML 1.1 spec. Look at the definition of NameStartChar: NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] If I interpret this correctly, the last range (#x10000-#xEFFFF) goes beyond the UTF16 range of Java's char type. So it must be UTF32, right? So, I need to check pairs of char against this range, instead of single chars, right? My questions are: How do I check for such character ranges using standard Java methods? How is it possible to define such ranges in JavaCC? JavaCC complains about \u10000 and \uEFFFF Thank you! NOTE: Don't worry, I am not trying to write an own XML-parser. I need those character ranges for other reasons.

Read the article
problem with uploading arabic files

- by sword101

I am using Spring upload to upload files. When uploading an Arabic file and getting the original file name in the controller, I get something like: المغفلين.png Any ideas why this problem occur?

Read the article
UnicodeDecodeError when redirecting to file

- by zedoo

Hi, I run this snippet twice, in the ubuntu terminal, (encoding set to utf-8) once with ./test.py and then with ./test.py >out.txt: uni = u"\u001A\u0BC3\u1451\U0001D10C" print uni Without redirection it prints garbage. With redirection I get a UnicodeDecodeError. Can someone explain why I get the error only in the second case, or even better give a detailed explanation of what's going on behind the curtain in both cases?

Read the article
How do you properly use WideCharToMultiByte

- by Obediah Stane

I've read the documentation here: http://msdn.microsoft.com/en-us/library/ms776420(VS.85).aspx I'm stuck on this parameter: lpMultiByteStr [out] Pointer to a buffer that receives the converted string. I'm not quite sure how to properly initialize the variable and feed it into the function

Read the article
utf8 format in xml

- by hussain

i want to know how to store this è (this type of symbols) in xml file if i store this symbol in xml file.. the file shows this symbol like ? i was inserted in front of xml file is <?xml version="1.0" encoding="UTF-8"?> but that doest not shows correct thanks and advance

Read the article
What character encoding should I use for a web page containing mostly Arabic text? Is utf-8 okay?

- by Paul D. Waite

What character encoding should I use for a web page containing mostly Arabic text? Is utf-8 okay?

Read the article
How to convert std::wstring to a TCHAR*

- by esac

std::wstring.c_str() returns a wchar_t*. How do I get from wchar_t* to TCHAR*, or from std::wstring to TCHAR* Thanks

Read the article
Displaying images in webpage without src URL

- by Babiker

Recently i learned that i can display images in a web page without referencing an image URL as follows : <img class="disclosure" img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAkAAAAJCAYAAADgkQYQAAAAAXNSR0IArs4c6QAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9oIGRQbOY8MjgMAAABVSURBVBjTfc6xDcAwCETRM0rt5nbA+49j70DDAqSLsGXyJQqkVxxwNOeMiEA+waW1VuT/inrvG7wikht8UETy2ygVMjO4O8YYTf6AqrZyUwYlygAAXo+QLmeF4c4uAAAAAElFTkSuQmCC"> I had another small bmp image that i wanted to display, so i opened it in vim and the img source looke like: When i paste this code where it needs to be pasted i only get "BM?" How to i convert/paste this code properly to be used as an image source?

Read the article
strange characters at beginning of file

- by luca

there are strange characters at the beginning of a file I'm editing (using textmate..) I don't know when they appeared, they're invisible in textmate but my script that reads the file goes crazy.. this is the first few chars in the file (as seen with od command): 0000000 177377 000120 000105 000117 000120 000114 000105 000072 the first 2 shouldn't be there I think.. maybe they were caused by some strange dropbox sync? Or something else.. but they tend to reappear (I don't yet know when..) My question: what is that 177377 and a simple way to remove it in my ruby script? thanks

Read the article
EBCDIC to ASCII conversion. Out of bound error. In C#.

- by mekrizzy

I tried creating a EBCDIC to ASCII convector in C# using this general conversion order(given below). Basically the program converted from ASCII to the equivalent integer and from there into EDCDIC using the order below. Now when I try compiling this in C# and try giving a EBCDIC string(got this from another file from another computer) it is showing 'Out of Bound' exception for some of the EBCDIC character. Why is this like this?? Is it about formating?? or C# ?? or windows? Extra: I tried just printing out all the ASCII and EBCDIC characters using a loop from 0..255 numbers but still its not showing many of the EBCDIC characters. Am I missing any standards? int[] eb2as = new int[256]{ 0, 1, 2, 3,156, 9,134,127,151,141,142, 11, 12, 13, 14, 15, 16, 17, 18, 19,157,133, 8,135, 24, 25,146,143, 28, 29, 30, 31, 128,129,130,131,132, 10, 23, 27,136,137,138,139,140, 5, 6, 7, 144,145, 22,147,148,149,150, 4,152,153,154,155, 20, 21,158, 26, 32,160,161,162,163,164,165,166,167,168, 91, 46, 60, 40, 43, 33, 38,169,170,171,172,173,174,175,176,177, 93, 36, 42, 41, 59, 94, 45, 47,178,179,180,181,182,183,184,185,124, 44, 37, 95, 62, 63, 186,187,188,189,190,191,192,193,194, 96, 58, 35, 64, 39, 61, 34, 195, 97, 98, 99,100,101,102,103,104,105,196,197,198,199,200,201, 202,106,107,108,109,110,111,112,113,114,203,204,205,206,207,208, 209,126,115,116,117,118,119,120,121,122,210,211,212,213,214,215, 216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231, 123, 65, 66, 67, 68, 69, 70, 71, 72, 73,232,233,234,235,236,237, 125, 74, 75, 76, 77, 78, 79, 80, 81, 82,238,239,240,241,242,243, 92,159, 83, 84, 85, 86, 87, 88, 89, 90,244,245,246,247,248,249, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,250,251,252,253,254,255 }; The whole code is as follows: public string convertFromEBCDICtoASCII(string inputEBCDICString, int initialPos, int endPos) { string inputSubString = inputEBCDICString.Substring(initialPos, endPos); int[] e2a = new int[256]{ 0, 1, 2, 3,156, 9,134,127,151,141,142, 11, 12, 13, 14, 15, 16, 17, 18, 19,157,133, 8,135, 24, 25,146,143, 28, 29, 30, 31, 128,129,130,131,132, 10, 23, 27,136,137,138,139,140, 5, 6, 7, 144,145, 22,147,148,149,150, 4,152,153,154,155, 20, 21,158, 26, 32,160,161,162,163,164,165,166,167,168, 91, 46, 60, 40, 43, 33, 38,169,170,171,172,173,174,175,176,177, 93, 36, 42, 41, 59, 94, 45, 47,178,179,180,181,182,183,184,185,124, 44, 37, 95, 62, 63, 186,187,188,189,190,191,192,193,194, 96, 58, 35, 64, 39, 61, 34, 195, 97, 98, 99,100,101,102,103,104,105,196,197,198,199,200,201, 202,106,107,108,109,110,111,112,113,114,203,204,205,206,207,208, 209,126,115,116,117,118,119,120,121,122,210,211,212,213,214,215, 216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231, 123, 65, 66, 67, 68, 69, 70, 71, 72, 73,232,233,234,235,236,237, 125, 74, 75, 76, 77, 78, 79, 80, 81, 82,238,239,240,241,242,243, 92,159, 83, 84, 85, 86, 87, 88, 89, 90,244,245,246,247,248,249, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,250,251,252,253,254,255 }; char chrItem = Convert.ToChar("0"); StringBuilder sb = new StringBuilder(); for (int i = 0; i < inputSubString.Length; i++) { try { chrItem = Convert.ToChar(inputSubString.Substring(i, 1)); sb.Append(Convert.ToChar(e2a[(int)chrItem])); sb.Append((int)chrItem); sb.Append((int)00); } catch (Exception ex) { Console.WriteLine("//" + ex.Message); return string.Empty; } } string result = sb.ToString(); sb = null; return result; }

Read the article
Charaters with jquery json

- by Mikk

Hi everyone, I'm using jquery $.getJSON to retrieve list of cities. Everything works fine, but I'm from Estonia (probably most of you don't know much about this country =D) and we are using some characters like õ, ü. ä, ö. When I pass letters like this to callback function, I keep getting empty strings. I've tried to base64 encode(server-side)-decode(jquery base64 plugin) strings (i thought it was a good idea as long as I can compress pages with php, so I don't have to worry about bandwidth), but in this way I end up with some random chinese symbols. What would be the best workaround for this problem. Thank you.

Read the article
\w in PHP preg_replace covers only second byte of UTF-8 chars

- by Andrey

we have this code: $value = preg_replace("/[^\w]/", '', $value); where $value is in utf-8. After this transformation first byte of multibyte characters is stripped. How to make \w cover UTF-8 chars completely? Sorry, i am not very well in PHP

Read the article

< Previous Page | 24 25 26 27 28 29 30 31 32 33 34 35 | Next Page >