Search Results

Search found 1649 results on 66 pages for 'unicode normalization'.

Page 29/66 | < Previous Page | 25 26 27 28 29 30 31 32 33 34 35 36  | Next Page >

  • Python's string.translate() doesn't fully work?

    - by Rhubarb
    Given this example, I get the error that follows: print u'\2033'.translate({2033:u'd'}) C:\Python26\lib\encodings\cp437.pyc in encode(self, input, errors) 10 11 def encode(self,input,errors='strict'): ---> 12 return codecs.charmap_encode(input,errors,encoding_map) 13 14 def decode(self,input,errors='strict'): UnicodeEncodeError: 'charmap' codec can't encode character u'\x83' in position 0

    Read the article

  • Why does Perl lose foreign characters on Windows input - can this be fixed (if so, how) or is Perl an outdated dinosaur that just can't handle this?

    - by Alex R
    Note below how ã changes to a This is causing me a huge problem as foreign characters show up in URLs, e.g. http://pt.wikipedia.org/wiki/Cão The OS is Windows 7, 64-bit. The Perl is: This is perl 5, version 12, subversion 2 (v5.12.2) built for MSWin32-x64-multi-thread (with 8 registered patches, see perl -V for more detail) Copyright 1987-2010, Larry Wall Binary build 1202 [293621] provided by ActiveState http://www.ActiveState.com Built Sep 6 2010 22:53:42 Additional update: To get around my particular problem, I tried using File::Find instead of piped input. The issue actually gets worse:

    Read the article

  • strange characters at beginning of file

    - by luca
    there are strange characters at the beginning of a file I'm editing (using textmate..) I don't know when they appeared, they're invisible in textmate but my script that reads the file goes crazy.. this is the first few chars in the file (as seen with od command): 0000000 177377 000120 000105 000117 000120 000114 000105 000072 the first 2 shouldn't be there I think.. maybe they were caused by some strange dropbox sync? Or something else.. but they tend to reappear (I don't yet know when..) My question: what is that 177377 and a simple way to remove it in my ruby script? thanks

    Read the article

  • EBCDIC to ASCII conversion. Out of bound error. In C#.

    - by mekrizzy
    I tried creating a EBCDIC to ASCII convector in C# using this general conversion order(given below). Basically the program converted from ASCII to the equivalent integer and from there into EDCDIC using the order below. Now when I try compiling this in C# and try giving a EBCDIC string(got this from another file from another computer) it is showing 'Out of Bound' exception for some of the EBCDIC character. Why is this like this?? Is it about formating?? or C# ?? or windows? Extra: I tried just printing out all the ASCII and EBCDIC characters using a loop from 0..255 numbers but still its not showing many of the EBCDIC characters. Am I missing any standards? int[] eb2as = new int[256]{ 0, 1, 2, 3,156, 9,134,127,151,141,142, 11, 12, 13, 14, 15, 16, 17, 18, 19,157,133, 8,135, 24, 25,146,143, 28, 29, 30, 31, 128,129,130,131,132, 10, 23, 27,136,137,138,139,140, 5, 6, 7, 144,145, 22,147,148,149,150, 4,152,153,154,155, 20, 21,158, 26, 32,160,161,162,163,164,165,166,167,168, 91, 46, 60, 40, 43, 33, 38,169,170,171,172,173,174,175,176,177, 93, 36, 42, 41, 59, 94, 45, 47,178,179,180,181,182,183,184,185,124, 44, 37, 95, 62, 63, 186,187,188,189,190,191,192,193,194, 96, 58, 35, 64, 39, 61, 34, 195, 97, 98, 99,100,101,102,103,104,105,196,197,198,199,200,201, 202,106,107,108,109,110,111,112,113,114,203,204,205,206,207,208, 209,126,115,116,117,118,119,120,121,122,210,211,212,213,214,215, 216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231, 123, 65, 66, 67, 68, 69, 70, 71, 72, 73,232,233,234,235,236,237, 125, 74, 75, 76, 77, 78, 79, 80, 81, 82,238,239,240,241,242,243, 92,159, 83, 84, 85, 86, 87, 88, 89, 90,244,245,246,247,248,249, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,250,251,252,253,254,255 }; The whole code is as follows: public string convertFromEBCDICtoASCII(string inputEBCDICString, int initialPos, int endPos) { string inputSubString = inputEBCDICString.Substring(initialPos, endPos); int[] e2a = new int[256]{ 0, 1, 2, 3,156, 9,134,127,151,141,142, 11, 12, 13, 14, 15, 16, 17, 18, 19,157,133, 8,135, 24, 25,146,143, 28, 29, 30, 31, 128,129,130,131,132, 10, 23, 27,136,137,138,139,140, 5, 6, 7, 144,145, 22,147,148,149,150, 4,152,153,154,155, 20, 21,158, 26, 32,160,161,162,163,164,165,166,167,168, 91, 46, 60, 40, 43, 33, 38,169,170,171,172,173,174,175,176,177, 93, 36, 42, 41, 59, 94, 45, 47,178,179,180,181,182,183,184,185,124, 44, 37, 95, 62, 63, 186,187,188,189,190,191,192,193,194, 96, 58, 35, 64, 39, 61, 34, 195, 97, 98, 99,100,101,102,103,104,105,196,197,198,199,200,201, 202,106,107,108,109,110,111,112,113,114,203,204,205,206,207,208, 209,126,115,116,117,118,119,120,121,122,210,211,212,213,214,215, 216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231, 123, 65, 66, 67, 68, 69, 70, 71, 72, 73,232,233,234,235,236,237, 125, 74, 75, 76, 77, 78, 79, 80, 81, 82,238,239,240,241,242,243, 92,159, 83, 84, 85, 86, 87, 88, 89, 90,244,245,246,247,248,249, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,250,251,252,253,254,255 }; char chrItem = Convert.ToChar("0"); StringBuilder sb = new StringBuilder(); for (int i = 0; i < inputSubString.Length; i++) { try { chrItem = Convert.ToChar(inputSubString.Substring(i, 1)); sb.Append(Convert.ToChar(e2a[(int)chrItem])); sb.Append((int)chrItem); sb.Append((int)00); } catch (Exception ex) { Console.WriteLine("//" + ex.Message); return string.Empty; } } string result = sb.ToString(); sb = null; return result; }

    Read the article

  • What's the deal with char.GetNumericValue?

    - by mgroves
    I was working on Project Euler 40, and was a bit bothered that there was no int.Parse(char). Not a big deal, but I did some asking around and someone suggested char.GetNumericValue. GetNumericValue seems like a very odd method to me: Takes in a char as a parameter and returns...a double? Returns -1.0 if the char is not '0' through '9' So what's the reasoning behind this method, and what purpose does returning a double serve? I even fired up Reflector and looked at InternalGetNumericValue, but it's just like watching Lost: every answer just leads to another question.

    Read the article

  • Using Python, How to copy files in 'temporary internet files' folder in Windows

    - by pythBegin
    I am using this code to find files recursively in a folder , with size greater than 50000 bytes. def listall(parent): lis=[] for root, dirs, files in os.walk(parent): for name in files: if os.path.getsize(os.path.join(root,name))>500000: lis.append(os.path.join(root,name)) return lis This is working fine. But when I used this on 'temporary internet files' folder in windows, am getting this error. Traceback (most recent call last): File "<pyshell#4>", line 1, in <module> listall(a) File "<pyshell#2>", line 5, in listall if os.path.getsize(os.path.join(root,name))>500000: File "C:\Python26\lib\genericpath.py", line 49, in getsize return os.stat(filename).st_size WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'C:\\Documents and Settings\\khedarnatha\\Local Settings\\Temporary Internet Files\\Content.IE5\\EDS8C2V7\\??????+1[1].jpg' I think this is because windows gives names with special characters in this specific folder... Please help to sort out this issue.

    Read the article

  • Obtain File size with os.path.getsize() in Python 2.7.5

    - by Ruxuan Ouyang
    I am new to python. I am trying to use os.path.getsize() to obtain the file size. However, if the file name is not in Englist, but in Chinese, Gemany, or French, etc, Python cannot recognize it and do not return the size of the file. Could you please help me with it? How can I let Python recognize the file's name and return the size of these kind of files? For example: The file's name is:?????????? ????????????? ? ????????????? ???????? ?? 2030?.doc path="C:\xxxx\xxx\xxxx\?????????? ????????????? ? ????????????? ???????? ?? 2030?.doc" I'd like to use" os.path.getsize(path) But it does not recognize the file name. Could you please kindly tell me what should I do? Thank you very much!

    Read the article

  • Why does Perl lose foreign characters on Windows; can this be fixed (if so, how)?

    - by Alex R
    Note below how ã changes to a. NOTE2: Before you blame this on CMD.EXE and Windows pipe weirdness, see Experiment 2 below which gets a similar problem using File::Find. The particular problem I'm trying to fix involves working with image files stored on a local drive, and manipulating the file names which may contain foreign characters. The two experiments shown below are intermediate debugging steps. The ã character is common in latin languages. e.g. http://pt.wikipedia.org/wiki/Cão Experiment 1 Experiment 2 To get around my particular problem, I tried using File::Find instead of piped input. The issue actually gets worse: Debugging update: I tried some of the tricks listed at http://perldoc.perl.org/perlunicode.html, e.g. use utf8, use feature 'unicode_strings', etc, to no avail. Environment and Version Info The OS is Windows 7, 64-bit. The Perl is: This is perl 5, version 12, subversion 2 (v5.12.2) built for MSWin32-x64-multi-thread (with 8 registered patches, see perl -V for more detail) Copyright 1987-2010, Larry Wall Binary build 1202 [293621] provided by ActiveState http://www.ActiveState.com Built Sep 6 2010 22:53:42

    Read the article

  • Python: Removing particular character (u"\u2610") from string

    - by duhaime
    I have been wrestling with decoding and encoding in Python, and I can't quite figure out how to resolve my problem. I am looping over xml text files (sample) that are apparently coded in utf-8, using Beautiful Soup to parse each file, then looking to see if any sentence in the file contains one or more words from two different list of words. Because the xml files are from the eighteenth century, I need to retain the em dashes that are in the xml. The code below does this just fine, but it also retains a pesky box character that I wish to remove. I believe the box character is this character. (You can find an example of the character I wish to remove in line 3682 of the sample file above. On this webpage, the character looks like an 'or' pipe, but when I read the xml file in Komodo, it looks like a box. When I try to copy and paste the box into a search engine, it looks like an 'or' pipe. When I print to console, though, the character looks like an empty box.) To sum up, the code below runs without errors, but it prints the empty box character that I would like to remove. for work in glob.glob(pathtofiles): openfile = open(work) readfile = openfile.read() stringfile = str(readfile) decodefile = stringfile.decode('utf-8', 'strict') #is this the dodgy line? soup = BeautifulSoup(decodefile) textwithtags = soup.findAll('text') textwithtagsasstring = str(textwithtags) #this method strips everything between anglebrackets as it should textwithouttags = stripTags(textwithtagsasstring) #clean text nonewlines = textwithouttags.replace("\n", " ") noextrawhitespace = re.sub(' +',' ', nonewlines) print noextrawhitespace #the boxes appear I tried to remove the boxes by using noboxes = noextrawhitespace.replace(u"\u2610", "") But Python threw an error flag: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 280: ordinal not in range(128) Does anyone know how I can remove the boxes from the xml files? I would be grateful for any help others can offer.

    Read the article

  • Forcing a mixed ISO-8859-1 and UTF-8 multi-line string into UTF-8 in Perl

    - by knorv
    Consider the following problem: A multi-line string $junk contains some lines which are encoded in UTF-8 and some in ISO-8859-1. I don't know a priori which lines are in which encoding, so heuristics will be needed. I want to turn $junk into pure UTF-8 with proper re-encoding of the ISO-8859-1 lines. Also, in the event of errors in the processing I want to provide a "best effort result" rather than throwing an error. My current attempt looks like this: $junk = &force_utf8($junk); sub force_utf8 { my $input = shift; my $output = ''; foreach my $line (split(/\n/, $input)) { if (utf8::valid($line)) { utf8::decode($line); } $output .= "$line\n"; } return $output; } While this appears to work I'm certain this is not the optimal solution. How would you improve the force_utf8(...) sub?

    Read the article

  • If a command line program is unsure of stdout's encoding, what encoding should it output?

    - by mackstann
    I have a command line program written in Python, and when I pipe it through another program on the command line, sys.stdout.encoding is None. This makes sense, I suppose -- the output could be another program, or a file you're redirecting it into, or whatever, and it doesn't know what encoding is desired. But neither do I! This program will be used by many different people (humor me) in different ways. Should I play it safe and output only ascii (replacing non-ascii chars with question marks)? Or should I output UTF-8, since it's so widespread these days?

    Read the article

  • Working with Japanese filenames in PHP 5.3 and Windows Vista?

    - by Jon
    I'm currently trying to write a simple script that looks in a folder, and returns a list of all the file names in an RSS feed. However I've hit a major wall... Whenever I try to read filenames with Japanese characters in them, it shows them as ?'s. I've tried the solutions mentioned here: http://stackoverflow.com/questions/482342/php-readdir-problem-with-japanese-language-file-name - however they do not work for some reason, even with: header('Content-Type: text/html; charset=UTF-8'); setlocale(LC_ALL, 'en_US.UTF8'); mb_internal_encoding("UTF-8"); At the top (Exporting as plain text until I can sort this out). What can I do? I need this to work and I don't have much time.

    Read the article

  • How can I test if an input field contains foreign characters?

    - by zeckdude
    I have an input field in a form. Upon pushing submit, I want to validate to make sure the user entered non-latin characters only, so any foreign language characters, like Chinese among many others. Or at the very least test to make sure it does not contain any latin characters. Could I use a regular expression for this? What would be the best approach for this? I am validating in both javaScript and in PHP. What solutions can I use to check for foreign characters in the input field in both programming languages?

    Read the article

  • Parsing through Arabic / RTL text from left to right

    - by Dan W
    Let's say I have a string in an RTL language such as Arabic with some English chucked in: string s = "Test:?????;?????;?????;a;b" Notice there are semicolons in the string. When I use the Split command like string[] spl = s.Split(';');, then some of the strings are saved in reverse order. This is what happens: ??Test:????? ????? ????? a b The above is out of order compared to the original. Instead, I expect to get this: ?Test: ????? ????? ????? a b I'm prepared to write my own split function. However, the chars in the string also parse in reverse order, so I'm back to square one. I just want to go through each character as it's shown on the screen.

    Read the article

  • What is the difference between _tmain() and main() in C++?

    - by joshcomley
    If I run my C++ application with the following main() method everything is OK: int main(int argc, char *argv[]) { cout << "There are " << argc << " arguments:" << endl; // Loop through each argument and print its number and value for (int i=0; i<argc; i++) cout << i << " " << argv[i] << endl; return 0; } I get what I expect and my arguments are printed out. However, if I use _tmain: int _tmain(int argc, char *argv[]) { cout << "There are " << argc << " arguments:" << endl; // Loop through each argument and print its number and value for (int i=0; i<argc; i++) cout << i << " " << argv[i] << endl; return 0; } It just displays the first character of each argument. What is the difference causing this?

    Read the article

  • PHP: Convert web-page to utf8

    - by Paul Tarjan
    I would like to only work with UTF8. The problem is I don't know the charset of every webpage. How can I detect it and convert to UTF8? <?php $url = "http://vkontakte.ru"; $ch = curl_init($url); $options = array( CURLOPT_RETURNTRANSFER => true, ); curl_setopt_array($ch, $options); $data = curl_exec($ch); // $data = magic($data); print $data; See this at: http://paulisageek.com/tmp/curl-utf8 What is magic()?

    Read the article

  • convert &uuml; to u

    - by Remus Rigo
    hi all I'm using a database that contains contacts (fields like name, address, ...). If i'm using in my database a city that contains special chars (like ü) or html codes (like &uuml;), then how can i convert them to u, so when i search for a city that contains that a special char should be shown in the result...

    Read the article

  • F5 Networks iRule/Tcl - Escaping UNICODE 6-character escape sequences so they are processed as and r

    - by openid.malcolmgin.com
    We are trying to get an F5 BIG-IP LTM iRule working properly with SharePoint 2007 in an SSL termination role. This architecture offloads all of the SSL processing to the F5 and the F5 forwards interactive requests/responses to the SharePoint front end servers via HTTP only (over a secure network). For the purposes of this discussion, iRules are parsed by a Tcl interpretation engine on the F5 Networks BIG-IP device. As such, the F5 does two things to traffic passing through it: Redirects any request to port 80 (HTTP) to port 443 (HTTPS) through HTTP 302 redirects and URL rewriting. Rewrites any response to the browser to selectively rewrite URLs embedded within the HTML so that they go to port 443 (HTTPS). This prevents the 302 redirects from breaking DHTML generated by SharePoint. We've got part 1 working fine. The main problem with part 2 is that in the response rewrite because of XML namespaces and other similar issues, not ALL matches for "http:" can be changed to "https:". Some have to remain "http:". Additionally, some of the "http:" URLs are difficult in that they live in SharePoint-generated JavaScript and their slashes (i.e. "/") are actually represented in the HTML by the UNICODE 6-character string, "\u002f". For example, in the case of these tricky ones, the literal string in the outgoing HTML is: http:\u002f\u002fservername.company.com\u002f And should be changed to: https:\u002f\u002fservername.company.com\u002f Currently we can't even figure out how to get a match in a search/replace expression on these UNICODE sequence string literals. It seems that no matter how we slice it, the Tcl interpreter is interpreting the "\u002f" string into the "/" translation before it does anything else. We've tried various combinations of Tcl escaping methods we know about (mainly double-quotes and using an extra "\" to escape the "\" in the UNICODE string) but are looking for more methods, preferably ones that work. Does anyone have any ideas or any pointers to where we can effectively self-educate about this? Thanks very much in advance.

    Read the article

  • What are ways to prevent files with the Right-to-Left Override Unicode character in their name (a malware spoofing method) from being written or read?

    - by galacticninja
    What are ways to avoid or prevent files with the RLO (Right-to-Left Override) Unicode character in their name (a malware method to spoof filenames) from being written or read in a Windows PC? More info on the RLO unicode character here: http://www.fileformat.info/info/unicode/char/202e/index.htm http://en.wikipedia.org/wiki/Bi-directional_text Info on the RLO unicode character when used by malware: http://www.ipa.jp/security/english/virus/press/201110/E_PR201110.html Mirror link: http://webcache.googleusercontent.com/search?q=cache:KasmfOvbVJ8J:www.ipa.jp/security/english/virus/press/201110/E_PR201110.html+&cd=1&hl=en&ct=clnk You can try this RLO character test webpage: http://www.fileformat.info/info/unicode/char/202e/browsertest.htm The RLO character is also already pasted in the 'Input Test' field in that webpage. Try typing there and notice that the characters you're typing are coming out in their reverse orders (right-to-left, instead of left-to-right). In filenames, the RLO character can be specifically positioned in the filename to spoof or masquerade as having a filename or file extension that is different than what it actually has. (Will still be hidden even if 'Hide extensions for known filetypes' is unchecked.) The only info I can find that has info on how to prevent files with the RLO character from being run is from the Information Technology Promotion Agency, Japan website: http://www.ipa.jp/security/english/virus/press/201110/E_PR201110.html (Mirror link). They adviced to use the Local Security Policy settings manager to block files with the RLO character in its name from being run. Can anyone recommend any other good solutions to prevent files with the RLO character in their names from being written or being read in the computer, or a way to alert the user if a file with the RLO character is detected? My OS is Windows 7, but I'll be looking for solutions for Windows XP, Vista and 7, or a solution that will work for all those OSes, to help people using those OSes too.

    Read the article

  • Can I convert an ASCII MD5 hashed password into a Unicode MD5 hashed password?

    - by Jimmy Moo Moo
    Hello, I'm looking for help to convert an ASCII MD5 hashed password into a Unicode MD5 hashed password? For example, I'll use the string "password" . When it's converted to an ascii byte array, I get a base64 encoded hash of X03MO1qnZdYdgyfeuILPmQ== When it's converted into a unicode byte array, I get a base64 encoded hash of sIHb6F4ew//D1OfQInQAzQ== All my passwords are stored in an md5 hash that was applied to an ascii byte array, but I'm trying to migrate my application's user data to a system that stores password in an md5 hash that is applied a unicode byte array. In case it's not clear, with the following C#code: var passwordBytes = Encoding.ASCII.GetBytes("password"); var hashAlgorithm = HashAlgorithm.Create("MD5"); var hashBytes = hashAlgorithm.ComputeHash(passwordBytes); My current system uses this, but the system I'm moving to has a diff first time. It usese Encoding.Unicode.GetBytes. Does anybody know how I can convert my passwords? From X03MO1qnZdYdgyfeuILPmQ== into sIHb6F4ew//D1OfQInQAzQ== I'm guessing the answer is that I can't.. the encoding is being done before the hashing, but I thought I'd inquire the bright minds of stackoverflow and see if anybody has a way.

    Read the article

  • Stream/string/bytearray transformations in Python 3

    - by Craig McQueen
    Python 3 cleans up Python's handling of Unicode strings. I assume as part of this effort, the codecs in Python 3 have become more restrictive, according to the Python 3 documentation compared to the Python 2 documentation. For example, codecs that conceptually convert a bytestream to a different form of bytestream have been removed: base64_codec bz2_codec hex_codec And codecs that conceptually convert Unicode to a different form of Unicode have also been removed (in Python 2 it actually went between Unicode and bytestream, but conceptually it's really Unicode to Unicode I reckon): rot_13 My main question is, what is the "right way" in Python 3 to do what these removed codecs used to do? They're not codecs in the strict sense, but "transformations". But the interface and implementation would be very similar to codecs. I don't care about rot_13, but I'm interested to know what would be the "best way" to implement a transformation of line ending styles (Unix line endings vs Windows line endings) which should really be a Unicode-to-Unicode transformation done before encoding to byte stream, especially when UTF-16 is being used, as discussed this other SO question.

    Read the article

  • Social media and special characters

    - by John Paul Cook
    I’ve previously blogged about using Unicode with T-SQL to put superscripts, subscripts, and special characters into text strings. Unicode is also useful in formatting social media such as Facebook, Twitter, and that dinosaur otherwise known as email. When you can’t set properties of text such as italicizing the subject line of an email message or adding subscripts to a Facebook post, Unicode can make it possible. There are Unicode characters that are intrinsically italicized. Others are intrinsically...(read more)

    Read the article

< Previous Page | 25 26 27 28 29 30 31 32 33 34 35 36  | Next Page >