unicode - Page 25 - Developer IT

How can I get Perl to detect the bad UTF-8 sequences?

- by gorilla

I'm running Perl 5.10.0 and Postgres 8.4.3, and strings into a database, which is behind a DBIx::Class. These strings should be in UTF-8, and therefore my database is running in UTF-8. Unfortunatly some of these strings are bad, containing malformed UTF-8, so when I run it I'm getting an exception DBI Exception: DBD::Pg::st execute failed: ERROR: invalid byte sequence for encoding "UTF8": 0xb5 I thought that I could simply ignore the invalid ones, and worry about the malformed UTF-8 later, so using this code, it should flag and ignore the bad titles. if(not utf8::valid($title)){ $title="Invalid UTF-8"; } $data->title($title); $data->update(); However Perl seems to think that the strings are valid, but it still throws the exceptions. How can I get Perl to detect the bad UTF-8?

Read the article

What character encoding should I use for a web page containing mostly Arabic text? Is utf-8 okay?

- by Paul D. Waite

What character encoding should I use for a web page containing mostly Arabic text? Is utf-8 okay?

Read the article

problem with uploading arabic files

- by sword101

I am using Spring upload to upload files. When uploading an Arabic file and getting the original file name in the controller, I get something like: المغفلين.png Any ideas why this problem occur?

Read the article

utf8 format in xml

- by hussain

i want to know how to store this è (this type of symbols) in xml file if i store this symbol in xml file.. the file shows this symbol like ? i was inserted in front of xml file is <?xml version="1.0" encoding="UTF-8"?> but that doest not shows correct thanks and advance

Read the article

UnicodeDecodeError when redirecting to file

- by zedoo

Hi, I run this snippet twice, in the ubuntu terminal, (encoding set to utf-8) once with ./test.py and then with ./test.py >out.txt: uni = u"\u001A\u0BC3\u1451\U0001D10C" print uni Without redirection it prints garbage. With redirection I get a UnicodeDecodeError. Can someone explain why I get the error only in the second case, or even better give a detailed explanation of what's going on behind the curtain in both cases?

Read the article

PHP detecting filesystem encoding

- by Evert

Hi guys, I need to save files with non-latin filenames on a filesytem, using PHP. I want to make this work cross-platform. How do I know what encoding I can use to write the file? I understand many modern filesystems are UTF-8 based (is this correct?), but I doubt Windows XP is (for instance). So, is there a robust detection mechanism? Evert

Read the article

python input UnicodeDecodeError:

- by The man on the Clapham omnibus

python 3.x >>> a = input() hope >>> a 'hope' >>> b = input() håpe >>> b 'håpe' >>> c = input() start typing hå... delete using backspace... and change to hope Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 1: invalid continuation byte >>> The situation is not terrible, I am working around it, but find it strange that when deleting, the bytes get messed up. Has anyone else experienced this? the terminal history shows that I thought that I entered h?ope any ideas? in the script that is using this, I do import readline to give command line history.

Read the article

Displaying images in webpage without src URL

- by Babiker

Recently i learned that i can display images in a web page without referencing an image URL as follows : <img class="disclosure" img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAkAAAAJCAYAAADgkQYQAAAAAXNSR0IArs4c6QAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9oIGRQbOY8MjgMAAABVSURBVBjTfc6xDcAwCETRM0rt5nbA+49j70DDAqSLsGXyJQqkVxxwNOeMiEA+waW1VuT/inrvG7wikht8UETy2ygVMjO4O8YYTf6AqrZyUwYlygAAXo+QLmeF4c4uAAAAAElFTkSuQmCC"> I had another small bmp image that i wanted to display, so i opened it in vim and the img source looke like: When i paste this code where it needs to be pasted i only get "BM?" How to i convert/paste this code properly to be used as an image source?

Read the article

Charaters with jquery json

- by Mikk

Hi everyone, I'm using jquery $.getJSON to retrieve list of cities. Everything works fine, but I'm from Estonia (probably most of you don't know much about this country =D) and we are using some characters like õ, ü. ä, ö. When I pass letters like this to callback function, I keep getting empty strings. I've tried to base64 encode(server-side)-decode(jquery base64 plugin) strings (i thought it was a good idea as long as I can compress pages with php, so I don't have to worry about bandwidth), but in this way I end up with some random chinese symbols. What would be the best workaround for this problem. Thank you.

Read the article

Characters with jquery json

- by Mikk

Hi everyone, I'm using jquery $.getJSON to retrieve list of cities. Everything works fine, but I'm from Estonia (probably most of you don't know much about this country =D) and we are using some characters like õ, ü. ä, ö. When I pass letters like this to callback function, I keep getting empty strings. I've tried to base64 encode(server-side)-decode(jquery base64 plugin) strings (i thought it was a good idea as long as I can compress pages with php, so I don't have to worry about bandwidth), but in this way I end up with some random chinese symbols. What would be the best workaround for this problem. Thank you.

Read the article

\w in PHP preg_replace covers only second byte of UTF-8 chars

- by Andrey

we have this code: $value = preg_replace("/[^\w]/", '', $value); where $value is in utf-8. After this transformation first byte of multibyte characters is stripped. How to make \w cover UTF-8 chars completely? Sorry, i am not very well in PHP

Read the article

Four byte encoding of U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS)?

- by knorv

Which character encoding represents the character ö (U+00F6, LATIN SMALL LETTER O WITH DIAERESIS or simply put chr(246) in ISO-8859-1) as the four octets combination chr(195) . chr(63) . chr(194) . chr(164)?

Read the article

wchar to char in c++

- by Chris

I have a Windows CE console application that's entry point looks like this int _tmain(int argc, _TCHAR* argv[]) I want to check the contents of argv[1] for "-s" convert argv[2] into an integer. I am having trouble narrowing the arguments or accessing them to test. I initially tried the following with little success if (argv[1] == L"-s") I also tried using the narrow function of wostringstream on each character but this crashed the application. Can anyone shed some light? Thanks

Read the article

Python's string.translate() doesn't fully work?

- by Rhubarb

Given this example, I get the error that follows: print u'\2033'.translate({2033:u'd'}) C:\Python26\lib\encodings\cp437.pyc in encode(self, input, errors) 10 11 def encode(self,input,errors='strict'): ---> 12 return codecs.charmap_encode(input,errors,encoding_map) 13 14 def decode(self,input,errors='strict'): UnicodeEncodeError: 'charmap' codec can't encode character u'\x83' in position 0

Read the article

Why does Perl lose foreign characters on Windows input - can this be fixed (if so, how) or is Perl an outdated dinosaur that just can't handle this?

- by Alex R

Note below how ã changes to a This is causing me a huge problem as foreign characters show up in URLs, e.g. http://pt.wikipedia.org/wiki/Cão The OS is Windows 7, 64-bit. The Perl is: This is perl 5, version 12, subversion 2 (v5.12.2) built for MSWin32-x64-multi-thread (with 8 registered patches, see perl -V for more detail) Copyright 1987-2010, Larry Wall Binary build 1202 [293621] provided by ActiveState http://www.ActiveState.com Built Sep 6 2010 22:53:42 Additional update: To get around my particular problem, I tried using File::Find instead of piped input. The issue actually gets worse:

Read the article

EBCDIC to ASCII conversion. Out of bound error. In C#.

- by mekrizzy

I tried creating a EBCDIC to ASCII convector in C# using this general conversion order(given below). Basically the program converted from ASCII to the equivalent integer and from there into EDCDIC using the order below. Now when I try compiling this in C# and try giving a EBCDIC string(got this from another file from another computer) it is showing 'Out of Bound' exception for some of the EBCDIC character. Why is this like this?? Is it about formating?? or C# ?? or windows? Extra: I tried just printing out all the ASCII and EBCDIC characters using a loop from 0..255 numbers but still its not showing many of the EBCDIC characters. Am I missing any standards? int[] eb2as = new int[256]{ 0, 1, 2, 3,156, 9,134,127,151,141,142, 11, 12, 13, 14, 15, 16, 17, 18, 19,157,133, 8,135, 24, 25,146,143, 28, 29, 30, 31, 128,129,130,131,132, 10, 23, 27,136,137,138,139,140, 5, 6, 7, 144,145, 22,147,148,149,150, 4,152,153,154,155, 20, 21,158, 26, 32,160,161,162,163,164,165,166,167,168, 91, 46, 60, 40, 43, 33, 38,169,170,171,172,173,174,175,176,177, 93, 36, 42, 41, 59, 94, 45, 47,178,179,180,181,182,183,184,185,124, 44, 37, 95, 62, 63, 186,187,188,189,190,191,192,193,194, 96, 58, 35, 64, 39, 61, 34, 195, 97, 98, 99,100,101,102,103,104,105,196,197,198,199,200,201, 202,106,107,108,109,110,111,112,113,114,203,204,205,206,207,208, 209,126,115,116,117,118,119,120,121,122,210,211,212,213,214,215, 216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231, 123, 65, 66, 67, 68, 69, 70, 71, 72, 73,232,233,234,235,236,237, 125, 74, 75, 76, 77, 78, 79, 80, 81, 82,238,239,240,241,242,243, 92,159, 83, 84, 85, 86, 87, 88, 89, 90,244,245,246,247,248,249, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,250,251,252,253,254,255 }; The whole code is as follows: public string convertFromEBCDICtoASCII(string inputEBCDICString, int initialPos, int endPos) { string inputSubString = inputEBCDICString.Substring(initialPos, endPos); int[] e2a = new int[256]{ 0, 1, 2, 3,156, 9,134,127,151,141,142, 11, 12, 13, 14, 15, 16, 17, 18, 19,157,133, 8,135, 24, 25,146,143, 28, 29, 30, 31, 128,129,130,131,132, 10, 23, 27,136,137,138,139,140, 5, 6, 7, 144,145, 22,147,148,149,150, 4,152,153,154,155, 20, 21,158, 26, 32,160,161,162,163,164,165,166,167,168, 91, 46, 60, 40, 43, 33, 38,169,170,171,172,173,174,175,176,177, 93, 36, 42, 41, 59, 94, 45, 47,178,179,180,181,182,183,184,185,124, 44, 37, 95, 62, 63, 186,187,188,189,190,191,192,193,194, 96, 58, 35, 64, 39, 61, 34, 195, 97, 98, 99,100,101,102,103,104,105,196,197,198,199,200,201, 202,106,107,108,109,110,111,112,113,114,203,204,205,206,207,208, 209,126,115,116,117,118,119,120,121,122,210,211,212,213,214,215, 216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231, 123, 65, 66, 67, 68, 69, 70, 71, 72, 73,232,233,234,235,236,237, 125, 74, 75, 76, 77, 78, 79, 80, 81, 82,238,239,240,241,242,243, 92,159, 83, 84, 85, 86, 87, 88, 89, 90,244,245,246,247,248,249, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,250,251,252,253,254,255 }; char chrItem = Convert.ToChar("0"); StringBuilder sb = new StringBuilder(); for (int i = 0; i < inputSubString.Length; i++) { try { chrItem = Convert.ToChar(inputSubString.Substring(i, 1)); sb.Append(Convert.ToChar(e2a[(int)chrItem])); sb.Append((int)chrItem); sb.Append((int)00); } catch (Exception ex) { Console.WriteLine("//" + ex.Message); return string.Empty; } } string result = sb.ToString(); sb = null; return result; }

Read the article

strange characters at beginning of file

- by luca

there are strange characters at the beginning of a file I'm editing (using textmate..) I don't know when they appeared, they're invisible in textmate but my script that reads the file goes crazy.. this is the first few chars in the file (as seen with od command): 0000000 177377 000120 000105 000117 000120 000114 000105 000072 the first 2 shouldn't be there I think.. maybe they were caused by some strange dropbox sync? Or something else.. but they tend to reappear (I don't yet know when..) My question: what is that 177377 and a simple way to remove it in my ruby script? thanks

Read the article

What's the deal with char.GetNumericValue?

- by mgroves

I was working on Project Euler 40, and was a bit bothered that there was no int.Parse(char). Not a big deal, but I did some asking around and someone suggested char.GetNumericValue. GetNumericValue seems like a very odd method to me: Takes in a char as a parameter and returns...a double? Returns -1.0 if the char is not '0' through '9' So what's the reasoning behind this method, and what purpose does returning a double serve? I even fired up Reflector and looked at InternalGetNumericValue, but it's just like watching Lost: every answer just leads to another question.

Read the article

Using Python, How to copy files in 'temporary internet files' folder in Windows

- by pythBegin

I am using this code to find files recursively in a folder , with size greater than 50000 bytes. def listall(parent): lis=[] for root, dirs, files in os.walk(parent): for name in files: if os.path.getsize(os.path.join(root,name))>500000: lis.append(os.path.join(root,name)) return lis This is working fine. But when I used this on 'temporary internet files' folder in windows, am getting this error. Traceback (most recent call last): File "<pyshell#4>", line 1, in <module> listall(a) File "<pyshell#2>", line 5, in listall if os.path.getsize(os.path.join(root,name))>500000: File "C:\Python26\lib\genericpath.py", line 49, in getsize return os.stat(filename).st_size WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'C:\\Documents and Settings\\khedarnatha\\Local Settings\\Temporary Internet Files\\Content.IE5\\EDS8C2V7\\??????+1[1].jpg' I think this is because windows gives names with special characters in this specific folder... Please help to sort out this issue.

Read the article

Obtain File size with os.path.getsize() in Python 2.7.5

- by Ruxuan Ouyang

I am new to python. I am trying to use os.path.getsize() to obtain the file size. However, if the file name is not in Englist, but in Chinese, Gemany, or French, etc, Python cannot recognize it and do not return the size of the file. Could you please help me with it? How can I let Python recognize the file's name and return the size of these kind of files? For example: The file's name is:?????????? ????????????? ? ????????????? ???????? ?? 2030?.doc path="C:\xxxx\xxx\xxxx\?????????? ????????????? ? ????????????? ???????? ?? 2030?.doc" I'd like to use" os.path.getsize(path) But it does not recognize the file name. Could you please kindly tell me what should I do? Thank you very much!

Read the article

Why does Perl lose foreign characters on Windows; can this be fixed (if so, how)?

- by Alex R

Note below how ã changes to a. NOTE2: Before you blame this on CMD.EXE and Windows pipe weirdness, see Experiment 2 below which gets a similar problem using File::Find. The particular problem I'm trying to fix involves working with image files stored on a local drive, and manipulating the file names which may contain foreign characters. The two experiments shown below are intermediate debugging steps. The ã character is common in latin languages. e.g. http://pt.wikipedia.org/wiki/Cão Experiment 1 Experiment 2 To get around my particular problem, I tried using File::Find instead of piped input. The issue actually gets worse: Debugging update: I tried some of the tricks listed at http://perldoc.perl.org/perlunicode.html, e.g. use utf8, use feature 'unicode_strings', etc, to no avail. Environment and Version Info The OS is Windows 7, 64-bit. The Perl is: This is perl 5, version 12, subversion 2 (v5.12.2) built for MSWin32-x64-multi-thread (with 8 registered patches, see perl -V for more detail) Copyright 1987-2010, Larry Wall Binary build 1202 [293621] provided by ActiveState http://www.ActiveState.com Built Sep 6 2010 22:53:42

Read the article

Differentiate between TCHAR and _TCHAR

- by Vulcan Eager

What are the various differences between the two symbols TCHAR and _TCHAR type defined in the Windows header tchar.h? Explain with examples. Briefly describe scenarios where you would use TCHAR as opposed to _TCHAR in your code. (10 marks)

Read the article

If a command line program is unsure of stdout's encoding, what encoding should it output?

- by mackstann

I have a command line program written in Python, and when I pipe it through another program on the command line, sys.stdout.encoding is None. This makes sense, I suppose -- the output could be another program, or a file you're redirecting it into, or whatever, and it doesn't know what encoding is desired. But neither do I! This program will be used by many different people (humor me) in different ways. Should I play it safe and output only ascii (replacing non-ascii chars with question marks)? Or should I output UTF-8, since it's so widespread these days?

Read the article

Forcing a mixed ISO-8859-1 and UTF-8 multi-line string into UTF-8 in Perl

- by knorv

Consider the following problem: A multi-line string $junk contains some lines which are encoded in UTF-8 and some in ISO-8859-1. I don't know a priori which lines are in which encoding, so heuristics will be needed. I want to turn $junk into pure UTF-8 with proper re-encoding of the ISO-8859-1 lines. Also, in the event of errors in the processing I want to provide a "best effort result" rather than throwing an error. My current attempt looks like this: $junk = &force_utf8($junk); sub force_utf8 { my $input = shift; my $output = ''; foreach my $line (split(/\n/, $input)) { if (utf8::valid($line)) { utf8::decode($line); } $output .= "$line\n"; } return $output; } While this appears to work I'm certain this is not the optimal solution. How would you improve the force_utf8(...) sub?

Read the article

Python: Removing particular character (u"\u2610") from string

- by duhaime

I have been wrestling with decoding and encoding in Python, and I can't quite figure out how to resolve my problem. I am looping over xml text files (sample) that are apparently coded in utf-8, using Beautiful Soup to parse each file, then looking to see if any sentence in the file contains one or more words from two different list of words. Because the xml files are from the eighteenth century, I need to retain the em dashes that are in the xml. The code below does this just fine, but it also retains a pesky box character that I wish to remove. I believe the box character is this character. (You can find an example of the character I wish to remove in line 3682 of the sample file above. On this webpage, the character looks like an 'or' pipe, but when I read the xml file in Komodo, it looks like a box. When I try to copy and paste the box into a search engine, it looks like an 'or' pipe. When I print to console, though, the character looks like an empty box.) To sum up, the code below runs without errors, but it prints the empty box character that I would like to remove. for work in glob.glob(pathtofiles): openfile = open(work) readfile = openfile.read() stringfile = str(readfile) decodefile = stringfile.decode('utf-8', 'strict') #is this the dodgy line? soup = BeautifulSoup(decodefile) textwithtags = soup.findAll('text') textwithtagsasstring = str(textwithtags) #this method strips everything between anglebrackets as it should textwithouttags = stripTags(textwithtagsasstring) #clean text nonewlines = textwithouttags.replace("\n", " ") noextrawhitespace = re.sub(' +',' ', nonewlines) print noextrawhitespace #the boxes appear I tried to remove the boxes by using noboxes = noextrawhitespace.replace(u"\u2610", "") But Python threw an error flag: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 280: ordinal not in range(128) Does anyone know how I can remove the boxes from the xml files? I would be grateful for any help others can offer.

Search Results

Search found 1474 results on 59 pages for 'unicode'.

Page 25/59 | < Previous Page | 21 22 23 24 25 26 27 28 29 30 31 32 | Next Page >

- by gorilla

- by Paul D. Waite

- by sword101

- by hussain

- by zedoo

- by Evert

- by The man on the Clapham omnibus

- by Babiker

- by Mikk

- by Mikk

- by Andrey

- by knorv

- by Chris

- by Rhubarb

- by Alex R

- by mekrizzy

- by luca

- by mgroves

- by pythBegin

- by Ruxuan Ouyang

- by Alex R

- by Vulcan Eager

- by mackstann

- by knorv

- by duhaime

< Previous Page | 21 22 23 24 25 26 27 28 29 30 31 32 | Next Page >