Search Results

Search found 1474 results on 59 pages for 'unicode'.

Page 21/59 | < Previous Page | 17 18 19 20 21 22 23 24 25 26 27 28 | Next Page >

Why is python decode replacing more than the invalid bytes from an encoded string?

- by dangra

Trying to decode an invalid encoded utf-8 html page gives different results in python, firefox and chrome. The invalid encoded fragment from test page looks like 'PREFIX\xe3\xabSUFFIX' >>> fragment = 'PREFIX\xe3\xabSUFFIX' >>> fragment.decode('utf-8', 'strict') ... UnicodeDecodeError: 'utf8' codec can't decode bytes in position 6-8: invalid data What follows is the summary of replacement policies used to handle decoding errors by python, firefox and chrome. Note how the three differs, and specially how python builtin removes the valid S (plus the invalid sequence of bytes). by Python The builtin replace error handler replaces the invalid \xe3\xab plus the S from SUFFIX by U+FFFD >>> fragment.decode('utf-8', 'replace') u'PREFIX\ufffdUFFIX' >>> print _ PREFIX?UFFIX The python implementation builtin replace error handler looks like: >>> python_replace = lambda exc: (u'\ufffd', exc.end) As expected, trying this gives same result than builtin: >>> codecs.register_error('python_replace', python_replace) >>> fragment.decode('utf-8', 'python_replace') u'PREFIX\ufffdUFFIX' >>> print _ PREFIX?UFFIX by Firefox Firefox replaces each invalid byte by U+FFFD >>> firefox_replace = lambda exc: (u'\ufffd', exc.start+1) >>> codecs.register_error('firefox_replace', firefox_replace) >>> test_string.decode('utf-8', 'firefox_replace') u'PREFIX\ufffd\ufffdSUFFIX' >>> print _ PREFIX??SUFFIX by Chrome Chrome replaces each invalid sequence of bytes by U+FFFD >>> chrome_replace = lambda exc: (u'\ufffd', exc.end-1) >>> codecs.register_error('chrome_replace', chrome_replace) >>> fragment.decode('utf-8', 'chrome_replace') u'PREFIX\ufffdSUFFIX' >>> print _ PREFIX?SUFFIX The main question is why builtin replace error handler for str.decode is removing the S from SUFFIX. Also, is there any unicode's official recommended way for handling decoding replacements?

Read the article
Convert UCS-2 characters to UTF-8 Using C#

- by quanticle

I'm pulling some internationalized text from a MS SQL Server 2005 database. As per the defaults for that DB, the characters are stored as UCS-2. However, I need to output the data in UTF-8 format, as I'm sending it out over the web. Currently, I have the following code to convert: SqlString dbString = resultReader.GetSqlString(0); byte[] dbBytes = dbString.GetUnicodeBytes(); byte[] utf8Bytes = System.Text.Encoding.Convert(System.Text.Encoding.Unicode, System.Text.Encoding.UTF8, dbBytes); System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding(); string outputString = encoder.GetString(utf8Bytes); However, when I examine the output in the browser, it appears to be garbage, no matter what I set the encoding to. What am I missing? EDIT: In response to the answers below, the reason I thought I had to perform a conversion is because I can output literal multibyte strings just fine. For example: OutputControl.Text = "????????????????????????????????????????????????????????????????"; works. Here, OutputControl is an ASP.Net Literal. However, OutputControl.Text = outputString; //Output from above snippet results in mangled output as described above. My hypothesis was that the database's output was somehow getting mangled by ASP.Net. If that's not the case, then what are some other possibilities?

Read the article
Windows cmd encoding change causes Python crash.

- by Alex

First I chage Windows CMD encoding to utf-8 and run Python interpreter: chcp 65001 python Then I try to print a unicode sting inside it and when i do this Python crashes in a peculiar way (I just get a cmd prompt in the same window). >>> import sys >>> print u'ëèæîð'.encode(sys.stdin.encoding) Any ideas why it happens and how to make it work? UPD: sys.stdin.encoding returns 'cp65001' UPD2: It just came to me that the issue might be connected with the fact that utf-8 uses multi-byte character set (kcwu made a good point on that). I tried running the whole example with 'windows-1250' and got 'ëeaî?'. Windows-1250 uses single-character set so it worked for those characters it understands. However I still have no idea how to make 'utf-8' work here. UPD3: Oh, I found out it is a known Python bug. I guess what happens is that Python copies the cmd encoding as 'cp65001 to sys.stdin.encoding and tries to apply it to all the input. Since it fails to understand 'cp65001' it crushes on any input that contains non-ascii characters.

Read the article
UILabel + IRR, KRW and KHR currencies with wrong symbol

- by serb

Hi, I'm experiencing issues when converting decimal to currency for Korean Won, Cambodian Riel and Iranian Rial and showing the result to the UILabel text. Conversion itself passes just fine and I can see correct currency symbol at the debugger, even the NSLog prints the symbol well. If I assign this NSString instance to the UILabel text, the currency symbol is shown as a crossed box instead of the correct symbol. There is no other code between, does not matter what font I use. I tried to print ? (Korean Won) using the unicode value (0x20A9) or even using UTF8 representation (\xe2\x82\xa9), but all I get is the crossed box on the label. Any other supported currency in iPhone SDK and NSLocale (nearly 170 currencies) works perfectly fine no matter how exotic the currency is. Anyone else experiencing the same problem? Is there a "cure" for this? Thanks EDIT: -(NSString *)decimalToCurrency:(NSDecimalNumber *)value byLocale:(NSLocale *)locale { NSNumberFormatter *fmt = [[NSNumberFormatter alloc] init]; [fmt setLocale: locale]; [fmt setNumberStyle: NSNumberFormatterCurrencyStyle]; NSString *res = [fmt stringFromNumber: value]; [fmt release]; return res; } lbValue.text = [self decimalToCurrency: price byLocale: koreanLocale];

Read the article
Are UTF16 (as used by for example wide-winapi functions) characters always 2 byte long?

- by Cray

Please clarify for me, how does UTF16 work? I am a little confused, considering these points: There is a static type in C++, WCHAR, which is 2 bytes long. (always 2 bytes long obvisouly) Most of msdn and some other documentation seem to have the assumptions that the characters are always 2 bytes long. This can just be my imagination, I can't come up with any particular examples, but it just seems that way. There are no "extra wide" functions or characters types widely used in C++ or windows, so I would assume that UTF16 is all that is ever needed. To my uncertain knowledge, unicode has a lot more characters than 65535, so they obvisouly don't have enough space in 2 bytes. UTF16 seems to be a bigger version of UTF8, and UTF8 characters can be of different lengths. So if a UTF16 character not always 2 bytes long, how long else could it be? 3 bytes? or only multiples of 2? And then for example if there is a winapi function that wants to know the size of a wide string in characters, and the string contains 2 characters which are each 4 bytes long, how is the size of that string in characters calculated? Is it 2 chars long or 4 chars long? (since it is 8 bytes long, and each WCHAR is 2 bytes)

Read the article
C++ read registry string value in char*

- by Sunny

I'm reading a registry value like this: char mydata[2048]; DWORD dataLength = sizeof(mydata); DWORD dwType = REG_SZ; ..... open key, etc ReqQueryValueEx(hKey, keyName, 0, &dwType, (BYTE*)mydata, &dataLength); My problem is, that after this, mydata content looks like: [63, 00, 3A, 00, 5C, 00...], i.e. this looks like a unicode?!?!. I need to convert this somehow to be a normal char array, without these [00], as they fail a simple logging function I have. I.e. if I call like this: WriteMessage(mydata), it outputs only "c", which is the first char in the registry. I have calls to this logging function all over the place, so I'd better of not modify it, but somehow "fix" the registry value. Here is the log function: void Logger::WriteMessage(const char *msg) { time_t now = time(0); struct tm* tm = localtime(&now); std::ofstream logFile; logFile.open(filename, std::ios::out | std::ios::app); if ( logFile.is_open() ) { logFile << tm->tm_mon << '/' << tm->tm_mday << '/' << tm->tm_year << ' '; logFile << tm->tm_hour << ':' << tm->tm_min << ':' << tm->tm_sec << "> "; logFile << msg << "\n"; logFile.close(); } }

Read the article
Batch script is not executed if chcp was called

- by Andy

Hello! I'm trying to delete some files with unicode characters in them with batch script (it's a requirement). So I run cmd and execute: > chcp 65001 Effectively setting codepage to UTF-8. And it works: D:\temp\1>dir Volume in drive D has no label. Volume Serial Number is 8C33-61BF Directory of D:\temp\1 02.02.2010 09:31 <DIR> . 02.02.2010 09:31 <DIR> .. 02.02.2010 09:32 508 1.txt 02.02.2010 09:28 12 delete.bat 02.02.2010 09:20 95 delete.cmd 02.02.2010 09:13 <DIR> Rún 02.02.2010 09:13 <DIR> ????? ??????? 3 File(s) 615 bytes 4 Dir(s) 11 576 438 784 bytes free D:\temp\1>rmdir Rún D:\temp\1>dir Volume in drive D has no label. Volume Serial Number is 8C33-61BF Directory of D:\temp\1 02.02.2010 09:56 <DIR> . 02.02.2010 09:56 <DIR> .. 02.02.2010 09:32 508 1.txt 02.02.2010 09:28 12 delete.bat 02.02.2010 09:20 95 delete.cmd 02.02.2010 09:13 <DIR> ????? ??????? 3 File(s) 615 bytes 3 Dir(s) 11 576 438 784 bytes free Then I put the same rmdir commands in batch script and save it in UTF-8 encoding. But when I run nothing happens, literally nothing: not even echo works from batch script in this case. Even saving script in OEM encoding does not help. So it seems that when I change codepage to UTF-8 in console, scripts just stop working. Does somebody know how to fix that?

Read the article
What encoding does c32rtomb convert to?

- by R. Martinho Fernandes

The functions c32rtomb and mbrtoc32 from <cuchar>/<uchar.h> are described in the C Unicode TR (draft) as performing conversions between UTF-321 and "multibyte characters". (...) If s is not a null pointer, the c32rtomb function determines the number of bytes needed to represent the multibyte character that corresponds to the wide character given by c32 (including any shift sequences), and stores the multibyte character representation in the array whose first element is pointed to by s. (...) What is this "multibyte character representation"? I'm actually interested in the behaviour of the following program: #include <cassert> #include <cuchar> #include <string> int main() { std::u32string u32 = U"this is a wide string"; std::string narrow = "this is a wide string"; std::string converted(1000, '\0'); char* ptr = &converted[0]; std::mbstate_t state {}; for(auto u : u32) { ptr += std::c32rtomb(ptr, u, &state); } converted.resize(ptr - &converted[0]); assert(converted == narrow); } Is the assertion in it guaranteed to hold1? 1 Working under the assumption that __STDC_UTF_32__ is defined.

Read the article
Transparent, unicode X terminal not tied to a Desktop Environment?

- by jamuraa

I've been looking for this for a while now and I just haven't been able to find one. The last few that I used were: aterm - this one was fast and had good transparency support, but it doesn't support Unicode at all as far as I can tell. The dependency graph is also reasonable. gnome-terminal - was good, and had good transparency support plus unicode, but it pulls in about everything in gnome, and I don't use anything else in gnome. It was also somewhat slow (noticable lag in updating at times) and wouldn't use fonts that I wanted. Eterm - same thing as aterm, good dependencies and transparency but no unicode. Does anyone have suggestions, or will I be stuck with gnome-terminal's dependencies and slowness?

Read the article
How to force PyYAML to load strings as unicode objects?

- by Petr Viktorin

The PyYAML package loads unmarked strings as either unicode or str objects, depending on their content. I would like to use unicode objects throughout my program (and, unfortunately, can't switch to Python 3 just yet). Is there an easy way to force PyYAML to always strings load unicode objects? I do not want to clutter my YAML with !!python/unicode tags. # Encoding: UTF-8 import yaml menu= u"""--- - spam - eggs - bacon - crème brûlée - spam """ print yaml.load(menu) Output: ['spam', 'eggs', 'bacon', u'cr\xe8me br\xfbl\xe9e', 'spam'] I would like: [u'spam', u'eggs', u'bacon', u'cr\xe8me br\xfbl\xe9e', u'spam']

Read the article
How do you convert HTML entities to Unicode and vice versa in Python?

- by hekevintran

Duplicate: Convert XML/HTML Entities into Unicode String in Python How do you convert HTML entities to Unicode and vice versa in Python?

Read the article
Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI

- by dfrey

I'm working on a english only C++ program for Windows where we were told "always use std::wstring", but it seems like nobody on the team really has much of an understanding beyond that. I already read the question titled "std::wstring VS std::string. It was very helpful, but I still don't quite understand how to apply all of that information to my problem. The program I'm working on displays data in a Windows GUI. That data is persisted as XML. We often transform that XML using XSLT into HTML or XSL:FO for reporting purposes. My feeling based on what I have read is that the HTML should be encoded as UTF-8. I know very little about GUI development, but the little bit I have read indicates that the GUI stuff is all based on UTF-16 encoded strings. I'm trying to understand where this leaves me. Say we decide that all of our persisted data should be UTF-8 encoded XML. Does this mean that in order to display persisted data in a UI component, I should really be performing some sort of explicit UTF-8 to UTF-16 transcoding process? I suspect my explanation could use clarification, so I'll try to provide that if you have any questions.

Read the article
SQL Server -> 'SQL_Latin1_General_CP1_CI_AS' Collation -> Varchar Column -> Languages Supported

- by Ajay Singh

All, We are using SQL Server 2008 with Collation Setting as 'SQL_Latin1_General_CP1_CI_AS'. We are using Varchar column to store textual data. We know that we cannot store Double Byte data in Varchar column and hence cannot support languages like Japanese and Chinese without converting it to NVarchar. However, will it be safe to say that all Single Byte Characters can be stored in Varchar column without any problem? If yes then from where can I get the list of languages which needs Single Byte for storage and the list of languages which needs double byte? Any assistance in this regard is highly appreciated. Thanks in advance.

Read the article
Forcing a mixed ISO-8859-1 and UTF-8 multi-line string into UTF-8

- by knorv

Consider the following problem: A multi-line string $junk contains some lines which are encoded in UTF-8 and some in ISO-8859-1. I don't know a priori which lines are in which encoding, so heuristics will be needed. I want to turn $junk into pure UTF-8 with proper re-encoding of the ISO-8859-1 lines. Also, in the event of errors in the processing I want to provide a "best effort result" rather than throwing an error. My current attempt looks like this: $junk = &force_utf8($junk); sub force_utf8 { my $input = shift; my $output = ''; foreach my $line (split(/\n/, $input)) { if (utf8::valid($line)) { utf8::decode($line); } $output .= "$line\n"; } return $output; } While this appears to work I'm certain this is not the optimal solution. How would you improve my force_utf8(...) sub?

Read the article
std::basic_stringstream<unsigned char> won't compile with MSVC 10

- by Michael J

I'm trying to get UTF-8 chars to co-exist with ANSI 8-bit chars. My strategy has been to represent utf-8 chars as unsigned char so that appropriate overloads of functions can be used for the two character types. e.g. namespace MyStuff { typedef uchar utf8_t; typedef std::basic_string<utf8_t> U8string; } void SomeFunc(std::string &s); void SomeFunc(std::wstring &s); void SomeFunc(MyStuff::U8string &s); This all works pretty well until I try to use a stringstream. std::basic_ostringstream<MyStuff::utf8_t> ostr; ostr << 1; MSVC Visual C++ Express V10 won't compile this: c:\program files\microsoft visual studio 10.0\vc\include\xlocmon(213): warning C4273: 'id' : inconsistent dll linkage c:\program files\microsoft visual studio 10.0\vc\include\xlocnum(65) : see previous definition of 'public: static std::locale::id std::numpunct<unsigned char>::id' c:\program files\microsoft visual studio 10.0\vc\include\xlocnum(65) : while compiling class template static data member 'std::locale::id std::numpunct<_Elem>::id' with [ _Elem=Tk::utf8_t ] c:\program files\microsoft visual studio 10.0\vc\include\xlocnum(1149) : see reference to function template instantiation 'const _Facet &std::use_facet<std::numpunct<_Elem>>(const std::locale &)' being compiled with [ _Facet=std::numpunct<Tk::utf8_t>, _Elem=Tk::utf8_t ] c:\program files\microsoft visual studio 10.0\vc\include\xlocnum(1143) : while compiling class template member function 'std::ostreambuf_iterator<_Elem,_Traits> std::num_put<_Elem,_OutIt>:: do_put(_OutIt,std::ios_base &,_Elem,std::_Bool) const' with [ _Elem=Tk::utf8_t, _Traits=std::char_traits<Tk::utf8_t>, _OutIt=std::ostreambuf_iterator<Tk::utf8_t,std::char_traits<Tk::utf8_t>> ] c:\program files\microsoft visual studio 10.0\vc\include\ostream(295) : see reference to class template instantiation 'std::num_put<_Elem,_OutIt>' being compiled with [ _Elem=Tk::utf8_t, _OutIt=std::ostreambuf_iterator<Tk::utf8_t,std::char_traits<Tk::utf8_t>> ] c:\program files\microsoft visual studio 10.0\vc\include\ostream(281) : while compiling class template member function 'std::basic_ostream<_Elem,_Traits> & std::basic_ostream<_Elem,_Traits>::operator <<(int)' with [ _Elem=Tk::utf8_t, _Traits=std::char_traits<Tk::utf8_t> ] c:\program files\microsoft visual studio 10.0\vc\include\sstream(526) : see reference to class template instantiation 'std::basic_ostream<_Elem,_Traits>' being compiled with [ _Elem=Tk::utf8_t, _Traits=std::char_traits<Tk::utf8_t> ] c:\users\michael\dvl\tmp\console\console.cpp(23) : see reference to class template instantiation 'std::basic_ostringstream<_Elem,_Traits,_Alloc>' being compiled with [ _Elem=Tk::utf8_t, _Traits=std::char_traits<Tk::utf8_t>, _Alloc=std::allocator<uchar> ] . c:\program files\microsoft visual studio 10.0\vc\include\xlocmon(213): error C2491: 'std::numpunct<_Elem>::id' : definition of dllimport static data member not allowed with [ _Elem=Tk::utf8_t ] Any ideas? ** Edited 19 June 2012 ** OK, I've gotten closer to understanding this, but not how to solve it. As we all know, static class variables get defined twice: once in the class definition and once outside the class definition which establishes storage space. e.g. // in .h file class CFoo { // ... static int x; }; // in .cpp file int CFoo::x = 42; Now in the VC10 headers we get something like this: template<class _Elem> class numpunct : public locale::facet { // ... _CRTIMP2_PURE static locale::id id; // ... } When the header is included in an application, _CRTIMP2_PURE is defined as __declspec(dllimport), which means that the variable is imported from a dll. Now the header also contains the following template<class _Elem> locale::id numpunct<_Elem>::id; Note the absence of the __declspec(dllimport) qualifier. i.e. The class declaration says that the static linkage of the id variable is in the dll, but for the general case, it gets declared outside the dll. For the known cases, there are specialisations. template locale::id numpunct<char>::id; template locale::id numpunct<wchar_t>::id; These are protected by #ifs so that they are only included when building the DLL. They are excluded otherwise. i.e. the char and wchar_t versions of numpunct ARE inside the dll So we have the class definition saying that id's storage is in the DLL, but that is only true for the char and wchar_t specialisations, meaning that my unsigned char version is doomed. :-( The only way forward that I can think of is to create my own specialisation: basically copying it from the header file and fixing it. This raises many issues. Anybody have a better idea?

Read the article
ICU MessageFormat Currency Value Precison Lost

- by Travis

This may be a niche question but I'm working with ICU to format currency strings. I've bumped into a situation that I don't quite understand. When using the MesssageFormat class, why for a certain locale (Korea "ko"KR" for example) does it round currency values (e.g. 100.50 becomes ?101). For most locales (such as the US "en_US"), the precision of the argument passed in remains untouched (e.g. 100.50 becomes $100.50). I thought this might be a default rounding issue that some locales have (Swiss Francs "fr_CH" for example have a default 0.05 rounding) but South Korea "ko_KR" has none. Any ideas?

Read the article
Does Perl's Net::Cassandra module support UTF-8?

- by knorv

I've run into a really strange UTF-8 problem with Net::Cassandra::Easy (which is built upon Net::Cassandra): UTF-8 strings written to Cassandra are garbled upon retrieval. The following code shows the problem: use strict; use utf8; use warnings; use Net::Cassandra::Easy; binmode(STDOUT, ":utf8"); my $key = "some_key"; my $column = "some_column"; my $set_value = "\x{2603}"; my $cassandra = Net::Cassandra::Easy->new(keyspace => "Keyspace1", server => "localhost"); $cassandra->connect(); $cassandra->mutate([$key], family => "Standard1", insertions => { $column => $set_value }); my $result = $cassandra->get([$key], family => "Standard1", standard => 1); my $get_value = $result->{$key}->{"Standard1"}->{$column}; if ($set_value eq $get_value) { # this is the path I want. print "OK: $set_value == $get_value\n"; } else { # this is the path I get. print "ERR: $set_value != $get_value\n"; } When running the code above $set_value eq $get_value evaluates to false. What am I doing wrong?

Read the article
stdout and stderr character encoding

- by Muhammad alaa

i working on a c++ string library that have main 4 classes that deals with ASCII, UTF8, UTF16, UTF32 strings, every class has Print function that format an input string and print the result to stdout or stderr. my problem is i don't know what is the default character encoding for those streams. for now my classes work in windows, later i'll add support for mac and linux so if you know anything about those stream encoding i'll appreciate it. thank you.

Read the article
Converting latin mysql data to utf8

- by Oguz

I want to use utf 8 right now , but all my data is latin1 , what is the efficient way to convert data .

Read the article
Dreaded python encoding errors, how to stop them?

- by Rhubarb

These have been plaguing me endlessly. Why? It seems that my console can't handle the encoding. I take it that the my browser and word processor can handle it. I don't have a master list of all the possible characters that it's choking on. What is the best way to relieve this without modifying my data? 'charmap' codec can't encode character u'\xca'

Read the article
Python interface to PayPal - urllib.urlencode non-ASCII characters failing

- by krys

I am trying to implement PayPal IPN functionality. The basic protocol is as such: The client is redirected from my site to PayPal's site to complete payment. He logs into his account, authorizes payment. PayPal calls a page on my server passing in details as POST. Details include a person's name, address, and payment info etc. I need to call a URL on PayPal's site internally from my processing page passing back all the params that were passed in abovem and an additional one called 'cmd' with a value of '_notify-validate'. When I try to urllib.urlencode the params which PayPal has sent to me, I get a: While calling send_response_to_paypal. Traceback (most recent call last): File "<snip>/account/paypal/views.py", line 108, in process_paypal_ipn verify_result = send_response_to_paypal(params) File "<snip>/account/paypal/views.py", line 41, in send_response_to_paypal params = urllib.urlencode(params) File "/usr/local/lib/python2.6/urllib.py", line 1261, in urlencode v = quote_plus(str(v)) UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 9: ordinal not in range(128) I understand that urlencode does ASCII encoding, and in certain cases, a user's contact info can contain non-ASCII characters. This is understandable. My question is, how do I encode non-ASCII characters for POSTing to a URL using urllib2.urlopen(req) (or other method) Details: I read the params in PayPal's original request as follows (the GET is for testing): def read_ipn_params(request): if request.POST: params= request.POST.copy() if "ipn_auth" in request.GET: params["ipn_auth"]=request.GET["ipn_auth"] return params else: return request.GET.copy() The code I use for sending back the request to PayPal from the processing page is: def send_response_to_paypal(params): params['cmd']='_notify-validate' params = urllib.urlencode(params) req = urllib2.Request(PAYPAL_API_WEBSITE, params) req.add_header("Content-type", "application/x-www-form-urlencoded") response = urllib2.urlopen(req) status = response.read() if not status == "VERIFIED": logging.warn("PayPal cannot verify IPN responses: " + status) return False return True Obviously, the problem only arises if someone's name or address or other field used for the PayPal payment does not fall into the ASCII range.

Read the article
PHP: Convert curl_exec output to UTF8

- by Paul Tarjan

I would like to only work with UTF8. The problem is I don't know the charset of every webpage. How can I detect it and convert to UTF8? <?php $url = "http://vkontakte.ru"; $ch = curl_init($url); $options = array( CURLOPT_RETURNTRANSFER => true, ); curl_setopt_array($ch, $options); $data = curl_exec($ch); // $data = magic($data); print $data; See this at: http://paulisageek.com/tmp/curl-utf8 What is magic()?

Read the article
Python: UnicodeEncodeError when reading from stdin

- by hansfbaier

When running a Python program that reads from stdin, I get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 320: ordinal not in range(128) How can I fix it?

Read the article
i18n / Markdown - Does Markdown support internationalization?

- by John Himmelman

I'm building a CMS which needs to manage content in english, chinese, and spanish at a minimum. Do most markdown implementations handle UTF-8 encoded text? Is the Markdown language designed to be used with non-english languages? I'm currently using Markdown Extra by Michel Fortin.

Read the article
java: can I convert strings with String.getBytes() without the BOM?

- by Cheeso

Suppose I have this code: String encoding = "UTF-16"; String text = "[Hello StackOverflow]"; byte[] message= text.getBytes(encoding); If I display the byte array in message, the result is: 0000 FE FF 00 5B 00 48 00 65 00 6C 00 6C 00 6F 00 20 ...[.H.e.l.l.o. 0010 00 53 00 74 00 61 00 63 00 6B 00 4F 00 76 00 65 .S.t.a.c.k.O.v.e 0020 00 72 00 66 00 6C 00 6F 00 77 00 5D .r.f.l.o.w.] As you can see, there's a BOM in the beginning. How can I: generate a UTF-16 byte array that lacks a BOM ? convert from a byte array that contains UTF=16 chars but lacks a BOM, back to a string?

Read the article

< Previous Page | 17 18 19 20 21 22 23 24 25 26 27 28 | Next Page >