Search Results

Search found 1649 results on 66 pages for 'unicode normalization'.

Page 25/66 | < Previous Page | 21 22 23 24 25 26 27 28 29 30 31 32  | Next Page >

  • Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI

    - by dfrey
    I'm working on a english only C++ program for Windows where we were told "always use std::wstring", but it seems like nobody on the team really has much of an understanding beyond that. I already read the question titled "std::wstring VS std::string. It was very helpful, but I still don't quite understand how to apply all of that information to my problem. The program I'm working on displays data in a Windows GUI. That data is persisted as XML. We often transform that XML using XSLT into HTML or XSL:FO for reporting purposes. My feeling based on what I have read is that the HTML should be encoded as UTF-8. I know very little about GUI development, but the little bit I have read indicates that the GUI stuff is all based on UTF-16 encoded strings. I'm trying to understand where this leaves me. Say we decide that all of our persisted data should be UTF-8 encoded XML. Does this mean that in order to display persisted data in a UI component, I should really be performing some sort of explicit UTF-8 to UTF-16 transcoding process? I suspect my explanation could use clarification, so I'll try to provide that if you have any questions.

    Read the article

  • std::basic_stringstream<unsigned char> won't compile with MSVC 10

    - by Michael J
    I'm trying to get UTF-8 chars to co-exist with ANSI 8-bit chars. My strategy has been to represent utf-8 chars as unsigned char so that appropriate overloads of functions can be used for the two character types. e.g. namespace MyStuff { typedef uchar utf8_t; typedef std::basic_string<utf8_t> U8string; } void SomeFunc(std::string &s); void SomeFunc(std::wstring &s); void SomeFunc(MyStuff::U8string &s); This all works pretty well until I try to use a stringstream. std::basic_ostringstream<MyStuff::utf8_t> ostr; ostr << 1; MSVC Visual C++ Express V10 won't compile this: c:\program files\microsoft visual studio 10.0\vc\include\xlocmon(213): warning C4273: 'id' : inconsistent dll linkage c:\program files\microsoft visual studio 10.0\vc\include\xlocnum(65) : see previous definition of 'public: static std::locale::id std::numpunct<unsigned char>::id' c:\program files\microsoft visual studio 10.0\vc\include\xlocnum(65) : while compiling class template static data member 'std::locale::id std::numpunct<_Elem>::id' with [ _Elem=Tk::utf8_t ] c:\program files\microsoft visual studio 10.0\vc\include\xlocnum(1149) : see reference to function template instantiation 'const _Facet &std::use_facet<std::numpunct<_Elem>>(const std::locale &)' being compiled with [ _Facet=std::numpunct<Tk::utf8_t>, _Elem=Tk::utf8_t ] c:\program files\microsoft visual studio 10.0\vc\include\xlocnum(1143) : while compiling class template member function 'std::ostreambuf_iterator<_Elem,_Traits> std::num_put<_Elem,_OutIt>:: do_put(_OutIt,std::ios_base &,_Elem,std::_Bool) const' with [ _Elem=Tk::utf8_t, _Traits=std::char_traits<Tk::utf8_t>, _OutIt=std::ostreambuf_iterator<Tk::utf8_t,std::char_traits<Tk::utf8_t>> ] c:\program files\microsoft visual studio 10.0\vc\include\ostream(295) : see reference to class template instantiation 'std::num_put<_Elem,_OutIt>' being compiled with [ _Elem=Tk::utf8_t, _OutIt=std::ostreambuf_iterator<Tk::utf8_t,std::char_traits<Tk::utf8_t>> ] c:\program files\microsoft visual studio 10.0\vc\include\ostream(281) : while compiling class template member function 'std::basic_ostream<_Elem,_Traits> & std::basic_ostream<_Elem,_Traits>::operator <<(int)' with [ _Elem=Tk::utf8_t, _Traits=std::char_traits<Tk::utf8_t> ] c:\program files\microsoft visual studio 10.0\vc\include\sstream(526) : see reference to class template instantiation 'std::basic_ostream<_Elem,_Traits>' being compiled with [ _Elem=Tk::utf8_t, _Traits=std::char_traits<Tk::utf8_t> ] c:\users\michael\dvl\tmp\console\console.cpp(23) : see reference to class template instantiation 'std::basic_ostringstream<_Elem,_Traits,_Alloc>' being compiled with [ _Elem=Tk::utf8_t, _Traits=std::char_traits<Tk::utf8_t>, _Alloc=std::allocator<uchar> ] . c:\program files\microsoft visual studio 10.0\vc\include\xlocmon(213): error C2491: 'std::numpunct<_Elem>::id' : definition of dllimport static data member not allowed with [ _Elem=Tk::utf8_t ] Any ideas? ** Edited 19 June 2012 ** OK, I've gotten closer to understanding this, but not how to solve it. As we all know, static class variables get defined twice: once in the class definition and once outside the class definition which establishes storage space. e.g. // in .h file class CFoo { // ... static int x; }; // in .cpp file int CFoo::x = 42; Now in the VC10 headers we get something like this: template<class _Elem> class numpunct : public locale::facet { // ... _CRTIMP2_PURE static locale::id id; // ... } When the header is included in an application, _CRTIMP2_PURE is defined as __declspec(dllimport), which means that the variable is imported from a dll. Now the header also contains the following template<class _Elem> locale::id numpunct<_Elem>::id; Note the absence of the __declspec(dllimport) qualifier. i.e. The class declaration says that the static linkage of the id variable is in the dll, but for the general case, it gets declared outside the dll. For the known cases, there are specialisations. template locale::id numpunct<char>::id; template locale::id numpunct<wchar_t>::id; These are protected by #ifs so that they are only included when building the DLL. They are excluded otherwise. i.e. the char and wchar_t versions of numpunct ARE inside the dll So we have the class definition saying that id's storage is in the DLL, but that is only true for the char and wchar_t specialisations, meaning that my unsigned char version is doomed. :-( The only way forward that I can think of is to create my own specialisation: basically copying it from the header file and fixing it. This raises many issues. Anybody have a better idea?

    Read the article

  • ICU MessageFormat Currency Value Precison Lost

    - by Travis
    This may be a niche question but I'm working with ICU to format currency strings. I've bumped into a situation that I don't quite understand. When using the MesssageFormat class, why for a certain locale (Korea "ko"KR" for example) does it round currency values (e.g. 100.50 becomes ?101). For most locales (such as the US "en_US"), the precision of the argument passed in remains untouched (e.g. 100.50 becomes $100.50). I thought this might be a default rounding issue that some locales have (Swiss Francs "fr_CH" for example have a default 0.05 rounding) but South Korea "ko_KR" has none. Any ideas?

    Read the article

  • Does Perl's Net::Cassandra module support UTF-8?

    - by knorv
    I've run into a really strange UTF-8 problem with Net::Cassandra::Easy (which is built upon Net::Cassandra): UTF-8 strings written to Cassandra are garbled upon retrieval. The following code shows the problem: use strict; use utf8; use warnings; use Net::Cassandra::Easy; binmode(STDOUT, ":utf8"); my $key = "some_key"; my $column = "some_column"; my $set_value = "\x{2603}"; my $cassandra = Net::Cassandra::Easy->new(keyspace => "Keyspace1", server => "localhost"); $cassandra->connect(); $cassandra->mutate([$key], family => "Standard1", insertions => { $column => $set_value }); my $result = $cassandra->get([$key], family => "Standard1", standard => 1); my $get_value = $result->{$key}->{"Standard1"}->{$column}; if ($set_value eq $get_value) { # this is the path I want. print "OK: $set_value == $get_value\n"; } else { # this is the path I get. print "ERR: $set_value != $get_value\n"; } When running the code above $set_value eq $get_value evaluates to false. What am I doing wrong?

    Read the article

  • stdout and stderr character encoding

    - by Muhammad alaa
    i working on a c++ string library that have main 4 classes that deals with ASCII, UTF8, UTF16, UTF32 strings, every class has Print function that format an input string and print the result to stdout or stderr. my problem is i don't know what is the default character encoding for those streams. for now my classes work in windows, later i'll add support for mac and linux so if you know anything about those stream encoding i'll appreciate it. thank you.

    Read the article

  • Dreaded python encoding errors, how to stop them?

    - by Rhubarb
    These have been plaguing me endlessly. Why? It seems that my console can't handle the encoding. I take it that the my browser and word processor can handle it. I don't have a master list of all the possible characters that it's choking on. What is the best way to relieve this without modifying my data? 'charmap' codec can't encode character u'\xca'

    Read the article

  • PHP: Convert curl_exec output to UTF8

    - by Paul Tarjan
    I would like to only work with UTF8. The problem is I don't know the charset of every webpage. How can I detect it and convert to UTF8? <?php $url = "http://vkontakte.ru"; $ch = curl_init($url); $options = array( CURLOPT_RETURNTRANSFER => true, ); curl_setopt_array($ch, $options); $data = curl_exec($ch); // $data = magic($data); print $data; See this at: http://paulisageek.com/tmp/curl-utf8 What is magic()?

    Read the article

  • java: can I convert strings with String.getBytes() without the BOM?

    - by Cheeso
    Suppose I have this code: String encoding = "UTF-16"; String text = "[Hello StackOverflow]"; byte[] message= text.getBytes(encoding); If I display the byte array in message, the result is: 0000 FE FF 00 5B 00 48 00 65 00 6C 00 6C 00 6F 00 20 ...[.H.e.l.l.o. 0010 00 53 00 74 00 61 00 63 00 6B 00 4F 00 76 00 65 .S.t.a.c.k.O.v.e 0020 00 72 00 66 00 6C 00 6F 00 77 00 5D .r.f.l.o.w.] As you can see, there's a BOM in the beginning. How can I: generate a UTF-16 byte array that lacks a BOM ? convert from a byte array that contains UTF=16 chars but lacks a BOM, back to a string?

    Read the article

  • Python interface to PayPal - urllib.urlencode non-ASCII characters failing

    - by krys
    I am trying to implement PayPal IPN functionality. The basic protocol is as such: The client is redirected from my site to PayPal's site to complete payment. He logs into his account, authorizes payment. PayPal calls a page on my server passing in details as POST. Details include a person's name, address, and payment info etc. I need to call a URL on PayPal's site internally from my processing page passing back all the params that were passed in abovem and an additional one called 'cmd' with a value of '_notify-validate'. When I try to urllib.urlencode the params which PayPal has sent to me, I get a: While calling send_response_to_paypal. Traceback (most recent call last): File "<snip>/account/paypal/views.py", line 108, in process_paypal_ipn verify_result = send_response_to_paypal(params) File "<snip>/account/paypal/views.py", line 41, in send_response_to_paypal params = urllib.urlencode(params) File "/usr/local/lib/python2.6/urllib.py", line 1261, in urlencode v = quote_plus(str(v)) UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 9: ordinal not in range(128) I understand that urlencode does ASCII encoding, and in certain cases, a user's contact info can contain non-ASCII characters. This is understandable. My question is, how do I encode non-ASCII characters for POSTing to a URL using urllib2.urlopen(req) (or other method) Details: I read the params in PayPal's original request as follows (the GET is for testing): def read_ipn_params(request): if request.POST: params= request.POST.copy() if "ipn_auth" in request.GET: params["ipn_auth"]=request.GET["ipn_auth"] return params else: return request.GET.copy() The code I use for sending back the request to PayPal from the processing page is: def send_response_to_paypal(params): params['cmd']='_notify-validate' params = urllib.urlencode(params) req = urllib2.Request(PAYPAL_API_WEBSITE, params) req.add_header("Content-type", "application/x-www-form-urlencoded") response = urllib2.urlopen(req) status = response.read() if not status == "VERIFIED": logging.warn("PayPal cannot verify IPN responses: " + status) return False return True Obviously, the problem only arises if someone's name or address or other field used for the PayPal payment does not fall into the ASCII range.

    Read the article

  • feedparser fails during script run, but can't reproduce in interactive python console

    - by Rhubarb
    It's failing with this when I run eclipse or when I run my script in iPython: 'ascii' codec can't decode byte 0xe2 in position 32: ordinal not in range(128) I don't know why, but when I simply execute the feedparse.parse(url) statement using the same url, there is no error thrown. This is stumping me big time. The code is as simple as: try: d = feedparser.parse(url) except Exception, e: logging.error('Error while retrieving feed.') logging.error(e) logging.error(formatExceptionInfo(None)) logging.error(formatExceptionInfo1()) Here is the stack trace: d = feedparser.parse(url) File "C:\Python26\lib\site-packages\feedparser.py", line 2623, in parse feedparser.feed(data) File "C:\Python26\lib\site-packages\feedparser.py", line 1441, in feed sgmllib.SGMLParser.feed(self, data) File "C:\Python26\lib\sgmllib.py", line 104, in feed self.goahead(0) File "C:\Python26\lib\sgmllib.py", line 143, in goahead k = self.parse_endtag(i) File "C:\Python26\lib\sgmllib.py", line 320, in parse_endtag self.finish_endtag(tag) File "C:\Python26\lib\sgmllib.py", line 360, in finish_endtag self.unknown_endtag(tag) File "C:\Python26\lib\site-packages\feedparser.py", line 476, in unknown_endtag method() File "C:\Python26\lib\site-packages\feedparser.py", line 1318, in _end_content value = self.popContent('content') File "C:\Python26\lib\site-packages\feedparser.py", line 700, in popContent value = self.pop(tag) File "C:\Python26\lib\site-packages\feedparser.py", line 641, in pop output = _resolveRelativeURIs(output, self.baseuri, self.encoding) File "C:\Python26\lib\site-packages\feedparser.py", line 1594, in _resolveRelativeURIs p.feed(htmlSource) File "C:\Python26\lib\site-packages\feedparser.py", line 1441, in feed sgmllib.SGMLParser.feed(self, data) File "C:\Python26\lib\sgmllib.py", line 104, in feed self.goahead(0) File "C:\Python26\lib\sgmllib.py", line 138, in goahead k = self.parse_starttag(i) File "C:\Python26\lib\sgmllib.py", line 296, in parse_starttag self.finish_starttag(tag, attrs) File "C:\Python26\lib\sgmllib.py", line 338, in finish_starttag self.unknown_starttag(tag, attrs) File "C:\Python26\lib\site-packages\feedparser.py", line 1588, in unknown_starttag attrs = [(key, ((tag, key) in self.relative_uris) and self.resolveURI(value) or value) for key, value in attrs] File "C:\Python26\lib\site-packages\feedparser.py", line 1584, in resolveURI return _urljoin(self.baseuri, uri) File "C:\Python26\lib\site-packages\feedparser.py", line 286, in _urljoin return urlparse.urljoin(base, uri) File "C:\Python26\lib\urlparse.py", line 215, in urljoin params, query, fragment)) File "C:\Python26\lib\urlparse.py", line 184, in urlunparse return urlunsplit((scheme, netloc, url, query, fragment)) File "C:\Python26\lib\urlparse.py", line 192, in urlunsplit url = scheme + ':' + url File "C:\Python26\lib\encodings\cp1252.py", line 15, in decode return codecs.charmap_decode(input,errors,decoding_table)

    Read the article

  • Windows C API for UTF8 to 1252

    - by Paul
    I'm familiar with WideCharToMultiByte and MultiByteToWideChar conversions and could use these to do something like: UTF8 - UTF16 - 1252 I know that iconv will do what I need, but does anybody know of any MS libs that will allow this in a single call? I should probably just pull in the iconv library, but am feeling lazy. Thanks

    Read the article

  • Writing UTF8 text to file

    - by sonofdelphi
    I am using the following function to save text to a file (on IE-8 w/ActiveX). function saveFile(strFullPath, strContent) { var fso = new ActiveXObject( "Scripting.FileSystemObject" ); var flOutput = fso.CreateTextFile( strFullPath, true ); //true for overwrite flOutput.Write( strContent ); flOutput.Close(); } The code works fine if the text is fully Latin-9 but when the text contains even a single UTF-8 encoded character, the write fails. The ActiveX FileSystemObject does not support UTF-8, it seems. I tried UTF-16 encoding the text first but the result was garbled. What is a workaround?

    Read the article

  • Use iconv API in C

    - by Constantin
    I try to convert a sjis string to utf-8 using the iconv API. I compiled it already succesfully, but the output isn't what I expected. My code: void convertUtf8ToSjis(char* utf8, char* sjis){ iconv_t icd; int index = 0; char *p_src, *p_dst; size_t n_src, n_dst; icd = iconv_open("Shift_JIS", "UTF-8"); int c; p_src = utf8; p_dst = sjis; n_src = strlen(utf8); n_dst = 32; // my sjis string size iconv(icd, &p_src, &n_src, &p_dst, &n_dst); iconv_close(icd); } I got only random numbers. Any ideas? Edit: My input is char utf8[] = "\xe4\xba\x9c"; //? And output should be: 0x88 0x9F But is in fact: 0x30 0x00 0x00 0x31 0x00 ...

    Read the article

  • How do I eliminate TT's "Wide character in print" warning ?

    - by planetp
    I have this warning every time I run my CGI-script (output is rendered by Template::Toolkit): Wide character in print at /usr/local/lib/perl5/site_perl/5.8.9/mach/Template.pm line 163. What's the right way to eliminate it? I create the tt object using this config: my %config = ( ENCODING => 'utf8', INCLUDE_PATH => $ENV{TEMPLATES_DIR}, EVAL_PERL => 1, } my $tt = Template->new(\%config);

    Read the article

  • Can you get access to the NumberFormatter used by ICU MessageFormat

    - by Travis
    This may be a niche question but I'm working with ICU to format currency strings. I've bumped into a situation that I don't quite understand. When using the MesssageFormat class, is it possible to get access to the NumberFormat object it uses to format currency strings. When you create a NumberFormat instance yourself, you can specify attributes like precision and rounding used when creating currency strings. I have an issue where for the South Korean locale ("ko_KR"), the MessageFormat class seems to create currency strings w/ rounding (100.50 - ?100). In areas where I use NumberFormat directly, I set setMaximumFractionDigits and setMinimumFractionDigits to 2 but I can't seem to set this in the MessageFormat. Any ideas?

    Read the article

  • UnicodeDecodeError on attempt to save file through django default filebased backend

    - by Ivan Kuznetsov
    When i attempt to add a file with russian symbols in name to the model instance through default instance.file_field.save method, i get an UnicodeDecodeError (ascii decoding error, not in range (128) from the storage backend (stacktrace ended on os.exist). If i write this file through default python file open/write all goes right. All filenames in utf-8. I get this error only on testing Gentoo, on my Ubuntu workstation all works fine. class Article(models.Model): file = models.FileField(null=True, blank=True, max_length = 300, upload_to='articles_files/%Y/%m/%d/') Traceback: File "/usr/lib/python2.6/site-packages/django/core/handlers/base.py" in get_response 100. response = callback(request, *callback_args, **callback_kwargs) File "/usr/lib/python2.6/site-packages/django/contrib/auth/decorators.py" in _wrapped_view 24. return view_func(request, *args, **kwargs) File "/var/www/localhost/help/wiki/views.py" in edit_article 338. new_article.file.save(fp, fi, save=True) File "/usr/lib/python2.6/site-packages/django/db/models/fields/files.py" in save 92. self.name = self.storage.save(name, content) File "/usr/lib/python2.6/site-packages/django/core/files/storage.py" in save 47. name = self.get_available_name(name) File "/usr/lib/python2.6/site-packages/django/core/files/storage.py" in get_available_name 73. while self.exists(name): File "/usr/lib/python2.6/site-packages/django/core/files/storage.py" in exists 196. return os.path.exists(self.path(name)) File "/usr/lib/python2.6/genericpath.py" in exists 18. st = os.stat(path) Exception Type: UnicodeEncodeError at /edit/ Exception Value: ('ascii', u'/var/www/localhost/help/i/articles_files/2010/03/17/\u041f\u0440\u0438\u0432\u0435\u0442', 52, 58, 'ordinal not in range(128)')

    Read the article

  • Validation of user input or ?????????

    - by zaf
    We're letting users search a database from a single text input and I'm having difficulties in filtering some user supplied strings. For example, if the user submits: ????????? lcd SONY (Note the ?'s) I need to cancel the search. I include the base64 encoded version of the above string wrapped up so that its easy run: print(base64_decode("1MfLxc/RwdPHIGxjZCBTT05Z")); I've ignored such inputs before but now (am not sure why) just realised the mysql database query is taking nearly forever to execute so this is now on high priority. Another example to highlight that we are using utf-8 and mb_detect_encoding is not helping much: print(base64_decode("zqDOm8+Fzr3PhM63z4HOuc6/IM+Bzr/Phc+HzyU=")); ????t???? ?????% So: how can I detect/filter these inputs? how is this input being generated?

    Read the article

  • Converting UnicodeString to PAnsiChar in Delphi XE

    - by moodforaday
    In Delphi XE I am using the BASS audio library, which contains this function: function BASS_StreamCreateURL(url: PAnsiChar; offset: DWORD; flags: DWORD; proc: DOWNLOADPROC; user: Pointer):HSTREAM; stdcall; external bassdll; The 'url' parameter is of type PAnsiChar, so in my code I do a cast: FStreamHandle := BASS_StreamCreateURL(PAnsiChar( url ) [...] The compiler emits a warning on this line: "suspicious typecast of string to PAnsiChar". In trying to eliminate the warning, I found that the recommended way is to use a double cast: FStreamHandle := BASS_StreamCreateURL(PAnsiChar( AnsiString( url )) [...] This does eliminate the warning, but the BASS function now returns error code 2 ("cannot open file"), which tells me the URL string it receives is somehow broken. I cannot see what the bass DLL actually receives, but using a breakpoint in the debugger the string looks good: var s : PAnsiChar; begin s := PAnsiChar( AnsiString( url )); At this point string s appears fine, but the BASS function fails when I pass it. My initial code: PAnsiChar( url ) works well with BASS, but emits a warning. So what's the correct way of getting from UnicodeString to PAnsiChar without a warning?

    Read the article

  • Is there any reason to prefer UTF-16 over UTF-8?

    - by Oak
    Examining the attributes of UTF-16 and UTF-8, I can't find any reason to prefer UTF-16. However, checking out Java and C#, it looks like strings and chars there default to UTF-16. I was thinking that it might be for historic reasons, or perhaps for performance reasons, but couldn't find any information. Anyone knows why these languages chose UTF-16? And is there any valid reason for me to do that as well?

    Read the article

< Previous Page | 21 22 23 24 25 26 27 28 29 30 31 32  | Next Page >