Search Results

Search found 13217 results on 529 pages for 'non unicode'.

Page 43/529 | < Previous Page | 39 40 41 42 43 44 45 46 47 48 49 50  | Next Page >

  • java: can I convert strings with String.getBytes() without the BOM?

    - by Cheeso
    Suppose I have this code: String encoding = "UTF-16"; String text = "[Hello StackOverflow]"; byte[] message= text.getBytes(encoding); If I display the byte array in message, the result is: 0000 FE FF 00 5B 00 48 00 65 00 6C 00 6C 00 6F 00 20 ...[.H.e.l.l.o. 0010 00 53 00 74 00 61 00 63 00 6B 00 4F 00 76 00 65 .S.t.a.c.k.O.v.e 0020 00 72 00 66 00 6C 00 6F 00 77 00 5D .r.f.l.o.w.] As you can see, there's a BOM in the beginning. How can I: generate a UTF-16 byte array that lacks a BOM ? convert from a byte array that contains UTF=16 chars but lacks a BOM, back to a string?

    Read the article

  • feedparser fails during script run, but can't reproduce in interactive python console

    - by Rhubarb
    It's failing with this when I run eclipse or when I run my script in iPython: 'ascii' codec can't decode byte 0xe2 in position 32: ordinal not in range(128) I don't know why, but when I simply execute the feedparse.parse(url) statement using the same url, there is no error thrown. This is stumping me big time. The code is as simple as: try: d = feedparser.parse(url) except Exception, e: logging.error('Error while retrieving feed.') logging.error(e) logging.error(formatExceptionInfo(None)) logging.error(formatExceptionInfo1()) Here is the stack trace: d = feedparser.parse(url) File "C:\Python26\lib\site-packages\feedparser.py", line 2623, in parse feedparser.feed(data) File "C:\Python26\lib\site-packages\feedparser.py", line 1441, in feed sgmllib.SGMLParser.feed(self, data) File "C:\Python26\lib\sgmllib.py", line 104, in feed self.goahead(0) File "C:\Python26\lib\sgmllib.py", line 143, in goahead k = self.parse_endtag(i) File "C:\Python26\lib\sgmllib.py", line 320, in parse_endtag self.finish_endtag(tag) File "C:\Python26\lib\sgmllib.py", line 360, in finish_endtag self.unknown_endtag(tag) File "C:\Python26\lib\site-packages\feedparser.py", line 476, in unknown_endtag method() File "C:\Python26\lib\site-packages\feedparser.py", line 1318, in _end_content value = self.popContent('content') File "C:\Python26\lib\site-packages\feedparser.py", line 700, in popContent value = self.pop(tag) File "C:\Python26\lib\site-packages\feedparser.py", line 641, in pop output = _resolveRelativeURIs(output, self.baseuri, self.encoding) File "C:\Python26\lib\site-packages\feedparser.py", line 1594, in _resolveRelativeURIs p.feed(htmlSource) File "C:\Python26\lib\site-packages\feedparser.py", line 1441, in feed sgmllib.SGMLParser.feed(self, data) File "C:\Python26\lib\sgmllib.py", line 104, in feed self.goahead(0) File "C:\Python26\lib\sgmllib.py", line 138, in goahead k = self.parse_starttag(i) File "C:\Python26\lib\sgmllib.py", line 296, in parse_starttag self.finish_starttag(tag, attrs) File "C:\Python26\lib\sgmllib.py", line 338, in finish_starttag self.unknown_starttag(tag, attrs) File "C:\Python26\lib\site-packages\feedparser.py", line 1588, in unknown_starttag attrs = [(key, ((tag, key) in self.relative_uris) and self.resolveURI(value) or value) for key, value in attrs] File "C:\Python26\lib\site-packages\feedparser.py", line 1584, in resolveURI return _urljoin(self.baseuri, uri) File "C:\Python26\lib\site-packages\feedparser.py", line 286, in _urljoin return urlparse.urljoin(base, uri) File "C:\Python26\lib\urlparse.py", line 215, in urljoin params, query, fragment)) File "C:\Python26\lib\urlparse.py", line 184, in urlunparse return urlunsplit((scheme, netloc, url, query, fragment)) File "C:\Python26\lib\urlparse.py", line 192, in urlunsplit url = scheme + ':' + url File "C:\Python26\lib\encodings\cp1252.py", line 15, in decode return codecs.charmap_decode(input,errors,decoding_table)

    Read the article

  • Windows C API for UTF8 to 1252

    - by Paul
    I'm familiar with WideCharToMultiByte and MultiByteToWideChar conversions and could use these to do something like: UTF8 - UTF16 - 1252 I know that iconv will do what I need, but does anybody know of any MS libs that will allow this in a single call? I should probably just pull in the iconv library, but am feeling lazy. Thanks

    Read the article

  • Writing UTF8 text to file

    - by sonofdelphi
    I am using the following function to save text to a file (on IE-8 w/ActiveX). function saveFile(strFullPath, strContent) { var fso = new ActiveXObject( "Scripting.FileSystemObject" ); var flOutput = fso.CreateTextFile( strFullPath, true ); //true for overwrite flOutput.Write( strContent ); flOutput.Close(); } The code works fine if the text is fully Latin-9 but when the text contains even a single UTF-8 encoded character, the write fails. The ActiveX FileSystemObject does not support UTF-8, it seems. I tried UTF-16 encoding the text first but the result was garbled. What is a workaround?

    Read the article

  • Use iconv API in C

    - by Constantin
    I try to convert a sjis string to utf-8 using the iconv API. I compiled it already succesfully, but the output isn't what I expected. My code: void convertUtf8ToSjis(char* utf8, char* sjis){ iconv_t icd; int index = 0; char *p_src, *p_dst; size_t n_src, n_dst; icd = iconv_open("Shift_JIS", "UTF-8"); int c; p_src = utf8; p_dst = sjis; n_src = strlen(utf8); n_dst = 32; // my sjis string size iconv(icd, &p_src, &n_src, &p_dst, &n_dst); iconv_close(icd); } I got only random numbers. Any ideas? Edit: My input is char utf8[] = "\xe4\xba\x9c"; //? And output should be: 0x88 0x9F But is in fact: 0x30 0x00 0x00 0x31 0x00 ...

    Read the article

  • How do I eliminate TT's "Wide character in print" warning ?

    - by planetp
    I have this warning every time I run my CGI-script (output is rendered by Template::Toolkit): Wide character in print at /usr/local/lib/perl5/site_perl/5.8.9/mach/Template.pm line 163. What's the right way to eliminate it? I create the tt object using this config: my %config = ( ENCODING => 'utf8', INCLUDE_PATH => $ENV{TEMPLATES_DIR}, EVAL_PERL => 1, } my $tt = Template->new(\%config);

    Read the article

  • Can you get access to the NumberFormatter used by ICU MessageFormat

    - by Travis
    This may be a niche question but I'm working with ICU to format currency strings. I've bumped into a situation that I don't quite understand. When using the MesssageFormat class, is it possible to get access to the NumberFormat object it uses to format currency strings. When you create a NumberFormat instance yourself, you can specify attributes like precision and rounding used when creating currency strings. I have an issue where for the South Korean locale ("ko_KR"), the MessageFormat class seems to create currency strings w/ rounding (100.50 - ?100). In areas where I use NumberFormat directly, I set setMaximumFractionDigits and setMinimumFractionDigits to 2 but I can't seem to set this in the MessageFormat. Any ideas?

    Read the article

  • UnicodeDecodeError on attempt to save file through django default filebased backend

    - by Ivan Kuznetsov
    When i attempt to add a file with russian symbols in name to the model instance through default instance.file_field.save method, i get an UnicodeDecodeError (ascii decoding error, not in range (128) from the storage backend (stacktrace ended on os.exist). If i write this file through default python file open/write all goes right. All filenames in utf-8. I get this error only on testing Gentoo, on my Ubuntu workstation all works fine. class Article(models.Model): file = models.FileField(null=True, blank=True, max_length = 300, upload_to='articles_files/%Y/%m/%d/') Traceback: File "/usr/lib/python2.6/site-packages/django/core/handlers/base.py" in get_response 100. response = callback(request, *callback_args, **callback_kwargs) File "/usr/lib/python2.6/site-packages/django/contrib/auth/decorators.py" in _wrapped_view 24. return view_func(request, *args, **kwargs) File "/var/www/localhost/help/wiki/views.py" in edit_article 338. new_article.file.save(fp, fi, save=True) File "/usr/lib/python2.6/site-packages/django/db/models/fields/files.py" in save 92. self.name = self.storage.save(name, content) File "/usr/lib/python2.6/site-packages/django/core/files/storage.py" in save 47. name = self.get_available_name(name) File "/usr/lib/python2.6/site-packages/django/core/files/storage.py" in get_available_name 73. while self.exists(name): File "/usr/lib/python2.6/site-packages/django/core/files/storage.py" in exists 196. return os.path.exists(self.path(name)) File "/usr/lib/python2.6/genericpath.py" in exists 18. st = os.stat(path) Exception Type: UnicodeEncodeError at /edit/ Exception Value: ('ascii', u'/var/www/localhost/help/i/articles_files/2010/03/17/\u041f\u0440\u0438\u0432\u0435\u0442', 52, 58, 'ordinal not in range(128)')

    Read the article

  • Validation of user input or ?????????

    - by zaf
    We're letting users search a database from a single text input and I'm having difficulties in filtering some user supplied strings. For example, if the user submits: ????????? lcd SONY (Note the ?'s) I need to cancel the search. I include the base64 encoded version of the above string wrapped up so that its easy run: print(base64_decode("1MfLxc/RwdPHIGxjZCBTT05Z")); I've ignored such inputs before but now (am not sure why) just realised the mysql database query is taking nearly forever to execute so this is now on high priority. Another example to highlight that we are using utf-8 and mb_detect_encoding is not helping much: print(base64_decode("zqDOm8+Fzr3PhM63z4HOuc6/IM+Bzr/Phc+HzyU=")); ????t???? ?????% So: how can I detect/filter these inputs? how is this input being generated?

    Read the article

  • Converting UnicodeString to PAnsiChar in Delphi XE

    - by moodforaday
    In Delphi XE I am using the BASS audio library, which contains this function: function BASS_StreamCreateURL(url: PAnsiChar; offset: DWORD; flags: DWORD; proc: DOWNLOADPROC; user: Pointer):HSTREAM; stdcall; external bassdll; The 'url' parameter is of type PAnsiChar, so in my code I do a cast: FStreamHandle := BASS_StreamCreateURL(PAnsiChar( url ) [...] The compiler emits a warning on this line: "suspicious typecast of string to PAnsiChar". In trying to eliminate the warning, I found that the recommended way is to use a double cast: FStreamHandle := BASS_StreamCreateURL(PAnsiChar( AnsiString( url )) [...] This does eliminate the warning, but the BASS function now returns error code 2 ("cannot open file"), which tells me the URL string it receives is somehow broken. I cannot see what the bass DLL actually receives, but using a breakpoint in the debugger the string looks good: var s : PAnsiChar; begin s := PAnsiChar( AnsiString( url )); At this point string s appears fine, but the BASS function fails when I pass it. My initial code: PAnsiChar( url ) works well with BASS, but emits a warning. So what's the correct way of getting from UnicodeString to PAnsiChar without a warning?

    Read the article

  • Is there any reason to prefer UTF-16 over UTF-8?

    - by Oak
    Examining the attributes of UTF-16 and UTF-8, I can't find any reason to prefer UTF-16. However, checking out Java and C#, it looks like strings and chars there default to UTF-16. I was thinking that it might be for historic reasons, or perhaps for performance reasons, but couldn't find any information. Anyone knows why these languages chose UTF-16? And is there any valid reason for me to do that as well?

    Read the article

  • String searching algorithm for Chinese characters.

    - by Jack Low
    There are Python code available for existing algorithms for normal string searching e.g. Boyer-Moore Algorithm. I am looking to use this on Chinese characters and it doesn't seem like the same implementation would work. What would I go about doing in order to make the algorithm work on Chinese characters? I am referring to this: http://en.literateprograms.org/Boyer-Moore_string_search_algorithm_(Python)#References

    Read the article

  • Python.expat can't parse XML file with bad symbols. How to go around?

    - by culebrón
    I'm trying to parse an XML file with expat, and here's the line where I get bad token exception: <tag k="name" v="???????????????????????????????????????????????????????????????????" /> xml.parsers.expat.ExpatError: not well-formed (invalid token): line 610127, column 37 The symbols in hex look like: \xd1? Seems like someone wrote this string (Russian alfabet) hitting backspace a few times. I set parser.returns_unicode = True, but this didn't help. The 1st line is <?xml version="1.0" encoding="UTF-8"?>. I work with a bz2 file. (bz2.BZ2File) How can I parse the file?

    Read the article

  • PHP: Cyrillic characters not displayed correctly

    - by user295502
    Recently I switched hosting from one provider to the other and I have problems displaying Cyrillic characters. The characters which are read from the database are displayed correctly, but characters which are hardcoded in the php file aren't (they are displayed as question marks). The files which contain the php source code are saved in utf-8 form. Help anybody?

    Read the article

  • python read utf8 text file problem

    - by cpps
    I have a problem with python about reading and print utf8 text file. I have a test.txt in utf8 encoding without BOM. This file has two characters in it: ?? The first character "?" is Chinese and the second "?" is Japanese. Now, When I use Ulipad (a python editor) to run the following code to read the txt file, and print these two characters. import codecs infile = "C:\\test.txt" f = codecs.open(infile, "r", "utf-8") s = f.read() print(s) I got this error, "UnicodeEncodeError: 'cp950' codec can't encode character '\u58f0' in position 1: illegal multibyte sequence" I found it caused from the second character "?" . But when I use the same code to test in python default GUI IDLE, it works to print the two characters with no error. So, how can I fix the problem. My running environment is python 3.1 , windows xp traditional Chinese.

    Read the article

  • How can I use ToUnicode without breaking dead key support?

    - by Cypherjb
    A similar question has already been asked, so I'm not going to waste time re-explaining it, an existing discussion can be found here: http://stackoverflow.com/questions/1964614/toascii-tounicode-in-a-keyboard-hook-destroys-dead-keys The reason I'm posting a new question however is that I seem to have come across a 'solution', but I'm not quite sure how to implement it. This blog post seems to propose a solution to the problem of ToUnicode killing dead-key support: http://blogs.msdn.com/michkap/archive/2005/01/19/355870.aspx However I'm not sure how to implement the suggested solution. A push in the right direction would be greatly appreciated. To be clear, the part I'm referring to is this: "There are two ways to work around this: 1) You can keep calling ToUnicode with the same info until it is cleared out and then call it one more time to put the state back where it was if you had never typed anything, or 2) You can load all of the keyboard info ahead of time and then when they type information you can look up in your own info cache what the keystrokes mean, without having to call APIs later." I'm not quite sure how to do either of those things (keyboards and internationalization are far from my strong point), so any help would be greatly appreciated. Thanks

    Read the article

  • Amazon SQS invalid binary character in message body

    - by letronje
    I have a web app that sends messages to an Amazon SQS Queue. Amazon sqs lib throws a 'AmazonSQSException' since the message contained invalid binary character. The message is the referrer obtained from an incoming http request. This is what it looks like: http://ads.vrx.adbrite.com/adserver/display_iab_ads.php?sid=1220459&title_color=0000FF&text_color=000000&background_color=FFFFFF&border_color=CCCCCC&url_color=008000&newwin=0&zs=3330305f323530&width=300&height=250&url=http%3A%2F%2Funblockorkutproxy.com%2Fsearch.php%2FOi8vZG93%2FbmxvYWRz%2FLnppZGR1%2FLmNvbS9k%2Fb3dubG9h%2FZGZpbGUv%2FNTY5MTQ3%2FNi9NeUN1%2FdGVHaXJs%2FZnJpZW5k%2FWmFoaXJh%2FLndtdi5o%2FdG1s%2Fb0%2F^Fô}úÃ<99ë)j Looks like the characters in bold are the invalid characters. Is there an easy way to filter out characters characters that are not accepted by amazon ? Here are the characters allowed by amazon in message body. I am not sure what regex i should use to replace invalid characters by ''

    Read the article

  • How do I create JavaScript escape sequences in PHP?

    - by ordinarytoucan
    I'm looking for a way to create valid UTF-16 JavaScript escape sequence characters (including surrogate pairs) from within PHP. I'm using the code below to get the UTF-32 code points (from a UTF-8 encoded character). This works as JavaScript escape characters (eg. '\u00E1' for 'á') - until you get into the upper ranges where you get surrogate pairs (eg '??' comes out as '\u1D715' but should be '\uD835\uDF15')... function toOrdinal($chr) { if (ord($chr{0}) >= 0 && ord($chr{0}) <= 127) { return ord($chr{0}); } elseif (ord($chr{0}) >= 192 && ord($chr{0}) <= 223) { return (ord($chr{0}) - 192) * 64 + (ord($chr{1}) - 128); } elseif (ord($chr{0}) >= 224 && ord($chr{0}) <= 239) { return (ord($chr{0}) - 224) * 4096 + (ord($chr{1}) - 128) * 64 + (ord($chr{2}) - 128); } elseif (ord($chr{0}) >= 240 && ord($chr{0}) <= 247) { return (ord($chr{0}) - 240) * 262144 + (ord($chr{1}) - 128) * 4096 + (ord($chr{2}) - 128) * 64 + (ord($chr{3}) - 128); } elseif (ord($chr{0}) >= 248 && ord($chr{0}) <= 251) { return (ord($chr{0}) - 248) * 16777216 + (ord($chr{1}) - 128) * 262144 + (ord($chr{2}) - 128) * 4096 + (ord($chr{3}) - 128) * 64 + (ord($chr{4}) - 128); } elseif (ord($chr{0}) >= 252 && ord($chr{0}) <= 253) { return (ord($chr{0}) - 252) * 1073741824 + (ord($chr{1}) - 128) * 16777216 + (ord($chr{2}) - 128) * 262144 + (ord($chr{3}) - 128) * 4096 + (ord($chr{4}) - 128) * 64 + (ord($chr{5}) - 128); } } How do I adapt this code to give me proper UTF-16 code points? Thanks!

    Read the article

< Previous Page | 39 40 41 42 43 44 45 46 47 48 49 50  | Next Page >