Search Results

Search found 4604 results on 185 pages for 'utf'.

Page 6/185 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

w3c validation error with utf-8

- by ian

When I try to validate a certain page I get the below error: Sorry, I am unable to validate this document because on line 136 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication. The error was: utf8 "\xFF" does not map to Unicode What exactly does this mean and how can I find out what character is causing the problem? The page is generated dynamically in PHP and a bit large and I am not sure what to look for.

Read the article
Truncate a UTF-8 string to fit a given byte count in PHP

- by fsb

Say we have a UTF-8 string $s and we need to shorten it so it can be stored in N bytes. Blindly truncating it to N bytes could mess it up. But decoding it to find the character boundaries is a drag. Is there a tidy way? [Edit 20100414] In addition to S.Mark’s answer: mb_strcut(), I recently found another function to do the job: grapheme_extract($s, $n, GRAPHEME_EXTR_MAXBYTES); from the intl extension. Since intl is an ICU wrapper, I have a lot of confidence in it.

Read the article
UTF-8 conversion

- by leachianus

Hey guys, I am grabbing a JSON array and storing it in a NSArray, however it includes JSON encoded UTF-8 strings, for example pass\u00e9 represents passé. I need a way of converting all of these different types of strings into the actual character. I have an entire NSArray to convert. Or I can convert it when it is being displayed, which ever is easiest. I found this chart http://tntluoma.com/sidebars/codes/ is there a convenience method for this or a library I can download? thanks, BTW, there is no way I can find to change the server so I can only fix it on my end...

Read the article
Convert ISO/Windows charsets to UTF-8 in Javascript

- by Amir

I'm developing a firefox plugin and i fetch web pages to do some analysis for the user. The problem is when i try to get (XMLHttpRequest) pages that are not utf-8 encoded the string i see is messed up. For example hebrew pages with windows-1125 or Chinese pages with gb2312. I already tried the following: var uDecoder=Components.classes["@mozilla.org/intl/scriptableunicodeconverter"].getService(Components.interfaces.nsIScriptableUnicodeConverter); uDecoder.charset="windows-1255"; alert( xhr.responseText ); var decoder=Components.classes["@mozilla.org/intl/utf8converterservice;1"].getService(Components.interfaces.nsIUTF8ConverterService); alert(decoder.convertStringToUTF8(xhr.responseText,"WINDOWS-1255",true)); I also tried escape/unescape/encodeURIComponent any ideas???

Read the article
Proper Regex to find and replace escaped UTF-8 strings

- by Piet Binnenbocht

(edited) I am reading a JSON file that includes some UTF-8 characters that are encoded like this: "\uf36b". I am trying to write a RegExp to convert this to an HTML entity that looks like "🍫". This displays the character correctly in my html page. I haven't been able to correctly display the character that should be associated with "\uf36b", especially when in a longer sentence that also includes other text. How can I write a regexp that replaces strings like "\uf4d6" and "\uf36b" but leaves other text alone? Example: var str = "I need \uf36b #chocolate"; This should be converted to: I need 🍫 #chocolate;

Read the article
HTML numerical to UTF

- by Teneke

How can i convert this $langClarContent = &# 1059.,ч.,и.,&# 1090.,&# 1077.,&# 1083.,Dokeos &# 1077., &# 1089.,&# 1080.,&# 1089.,&# 1090.,&# 1077.,&# 1084., &# 1079.,&# 1072., &# 1091.,&# 1087.,&# 1088.,&# 1072.,&# 1074.,&# 1091.,&# 1074.,&# 1072.,&# 1114.,е., &# 1089.,о., &# 1091.,&# 1095.,&# 1077.,&# 1114.,&# 1077., &# 1080., &# 1079.,&# 1085.,&# 1072.,е.,&# 1114.,&# 1077.,. &# 1058.,&# 1086.,&# 1112., &# 1080.,м., &# 1076.,&# 1086.,&# 1087.,&# 1091.,&# 1096.,&# 1090.,&# 1072., &# 1085.,&# 1072., у.,&# 1095 to an utf related answer like $langClarContent = Учител Dokeos е систем за управување со учење и знаење. Тој им допушта на уч

Read the article
How to read utf-8 xml from vbs and get correct character code

- by vkjr

I'm trying to read xml file from vbs script. Xml is encoded in utf-8 and has appropriate header From vbs script I use microsoft xmldom parser to read xml: Dim objXMLDoc Set objXMLDoc = CreateObject( "Microsoft.XMLDOM" ) objXMLDoc.load("vbs_strings.xml") Inside xml I'm trying to write character by code using &#nnn; notation. Then I read this character from vbscript and try to get it's code using Asc() function. For some characters it works fine and read code is equal to one written. But for some characters Asc() always returns code 63. What could it be? Examples: If xml contains <section>Ã<section> and in script I have Section variable for representing this xml node then code: Asc(Section.Text) will return value 195 and it's ok. If xml contains <section>n<section> then code: Asc(Section.Text) will return value 110 and it's ok. But if xml contains <section><section> or <section><section> or <section><section> Asc(Section.Text) will return value 63 and it's definitely not good. Do you know why?

Read the article
Encoding MySQL text fields into UTF-8 text files - problems with special characters

- by Matt Andrews

I'm writing a php script to export MySQL database rows into a .txt file formatted for Adobe InDesign's internal markup. Exports work, but when I encounter special characters like é or umlauts, I get weird symbols (eg ChloÃ« Hanslip instead of Chloë Hanslip). Rather than run a search and replace for every possible weird character, I need a better method. I've checked that when the text hits the database, it's saved properly - in the database I see the special characters. My export code basically runs some regular expressions to put in the InDesign code tags, and I'm left with the weird symbols. If I just output the text to the browser (rather than prompt for a text file download), it displays properly. When I save the file I use this code: header("Content-disposition: attachment; filename=test.txt"); header("Content-Type: text/plain; charset=utf-8"); I've tried various combinations of utf8_encode() and iconv() to no avail. Can anybody point me in the right direction here?

Read the article
MediaFileUpload of HTML in UTF-8 encoding using Python and Google-Drive-SDK

- by Victoria

Looking for example using MediaFileUpload has a reference to the basic documentation for creating/uploading a file to Google Drive. However, while I have code that creates files, converting from HTML to Google Doc format. It works perfectly when they contain only ASCII characters, but when I add a non-ASCII character, it fails, with the following traceback: Traceback (most recent call last): File "d:\my\py\ckwort.py", line 949, in <module> rids, worker_documents = analyze( meta, gd ) File "d:\my\py\ckwort.py", line 812, in analyze gd.mkdir( **iy ) File "d:\my\py\ckwort.py", line 205, in mkdir self.create( **( kw['subop'])) File "d:\my\py\ckwort.py", line 282, in create media_body=kw['media_body'], File "D:\my\py\gdrive2\oauth2client\util.py", line 120, in positional_wrapper return wrapped(*args, **kwargs) File "D:\my\py\gdrive2\apiclient\http.py", line 676, in execute headers=self.headers) File "D:\my\py\gdrive2\oauth2client\util.py", line 120, in positional_wrapper return wrapped(*args, **kwargs) File "D:\my\py\gdrive2\oauth2client\client.py", line 420, in new_request redirections, connection_type) File "D:\my\py\gdrive2\httplib2\__init__.py", line 1597, in request (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey) File "D:\my\py\gdrive2\httplib2\__init__.py", line 1345, in _request (response, content) = self._conn_request(conn, request_uri, method, body, headers) File "D:\my\py\gdrive2\httplib2\__init__.py", line 1282, in _conn_request conn.request(method, request_uri, body, headers) File "C:\Python27\lib\httplib.py", line 958, in request self._send_request(method, url, body, headers) File "C:\Python27\lib\httplib.py", line 992, in _send_request self.endheaders(body) File "C:\Python27\lib\httplib.py", line 954, in endheaders self._send_output(message_body) File "C:\Python27\lib\httplib.py", line 812, in _send_output msg += message_body UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 370: ordinal not in range(128) I don't find any parameter to use to specify what file encoding should be used by MediaFileUpload (My files are using UTF-8). Am I missing something?

Read the article
How to query MySQL for exact length and exact UTF-8 characters

- by oskarae

I have table with words dictionary in my language (latvian). CREATE TABLE words ( value varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; And let's say it has 3 words inside: INSERT INTO words (value) VALUES ('teja'); INSERT INTO words (value) VALUES ('vejš'); INSERT INTO words (value) VALUES ('feja'); What I want to do is I want to find all words that is exactly 4 characters long and where second character is 'e' and third character is 'j' For me it feels that correct query would be: SELECT * FROM words WHERE value LIKE '_ej_'; But problem with this query is that it returs not 2 entries ('teja','vejš') but all three. As I understand it is because internally MySQL converts strings to some ASCII representation? Then there is BINARY addition possible for LIKE SELECT * FROM words WHERE value LIKE BINARY '_ej_'; But this also does not return 2 entries ('teja','vejš') but only one ('teja'). I believe this has something to do with UTF-8 2 bytes for non ASCII chars? So question: What MySQL query would return my exact two words ('teja','vejš')? Thank you in advance

Read the article
How to set all locale settings in Ubuntu

- by Christian Schneider

A remote installed application has some encoding problems and on my local machine it is running fine. What is the best way to "copy" my locales to the remote machine? The locales on my personal machine are configured like this: $ locale LANG=de_DE.UTF-8 LANGUAGE=de_DE:en LC_CTYPE="de_DE.UTF-8" LC_NUMERIC=en_US.UTF-8 LC_TIME=en_US.UTF-8 LC_COLLATE="de_DE.UTF-8" LC_MONETARY=en_US.UTF-8 LC_MESSAGES="de_DE.UTF-8" LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 LC_ALL=

Read the article
Core data and special characters (UTF-8)

- by MW

I have an iPhone application using Core Data with an SQLite database in the bottom. I'm writing some text content from the database to a file, but special characters such as Å, Ä and Ö are corrupted in the file (they show up just fine in the application). When creating and inserting data, I am not using any special encoding. I'm just taking the NSString (entered by the user in a UITextField) and putting it in my persistent objects. When saving the file, I use the following code: [csvString writeToFile:filePath atomically:YES encoding:NSUTF8StringEncoding error:&error]; I tried adding a BOM to the beginning of the text ("\xef\xbb\xbf") but it is still corrupted. Anyone has any ideas where the problem might be? Examples of corrupted characters: å becomes Ã¶, ä becomes Ã¤

Read the article
mysql utf encoding

- by user121196

java.sql.SQLException: Incorrect string value: '\xAC\xED\x00\x05sr...' for column 'xxxx' the column is a longtext in MYSQL with utf8 charset and utf8_general_ci collation. what's wrong?

Read the article
Handling over-long UTF-8 sequences

- by Grant McLean

I've just been reworking my Encoding::FixLatin Perl module to handle over-long utf8 byte sequences and convert them to the shortest normal form. My question is quite simply "is this a bad idea"? A number of sources (including this RFC) suggest that any over-long utf8 should be treated as an error and rejected. They caution against "naive implementations" and leave me with the impression that these things are inherently unsafe. Since the whole purpose of my module is to clean up messy data files with mixed encodings and convert them to nice clean utf8, this seems like just one more thing I can clean up so the application layer doesn't have to deal with it. My code does not concern itself with any semantic meaning the resulting characters might have, it simply converts them into a normalised form. Am I missing something. Is there a hidden danger I haven't considered?

Read the article
C#, UTF-8 and encoding characters

- by AspNyc

This is a shot-in-the-dark, and I apologize in advance if this question sounds like the ramblings of a madman. As part of an integration with a third party, I need to UTF8-encode some string info using C# so I can send it to the target server via multipart form. The problem is that they are rejecting some of my submissions, probably because I'm not encoding their contents correctly. Right now, I'm trying to figure out how a dash or hyphen -- I can't tell which it is just by looking at it -- is received or interpreted by the target server as ?~@~S (yes, that's a 5-character string and is not your browser glitching out). And unfortunately I don't have a thorough enough understanding of Encoding.UTF8.GetBytes() to know how to use the byte array to begin identifying where the problem might lie. If anybody can provide any tips or advice, I would greatly appreciate it. So far my only friend has been MSDN, and not much of one at that.

Read the article
MySQL ODBC 3.51 Driver UTF-8 encoding

- by kesava

Currently I am migrating MSSQL to MYSQL.I am using the MySQL ODBC 3.51 driver to connect to mysql using odbc connectivity.I have telugu language charectors stored in the table.They are not showing properly while using the mysql odbc driver, but they are showing up properly while using the sqlserver odbc driver. Here is my connetion string Driver={MySQL ODBC 3.51 Driver};Server=localhost;Database=dbtest; User=user1;Password=mysql;Option=3;CharSet=utf8; Please suggest a solution to fix this.

Read the article
converting html entity to utf-8 character

- by Anthony Umpad

Hello, I am having this problems in grails where I am writing a string from the database into an xml file using StreamingMarkUpBuilder. The xml file displays the string as htmlentities &#x30b3 &#x30d4 &#x30fc, how can I convert them to be printed as コピー? Thanks!

Read the article
UTF-8 - Oracle issue

- by goe

I set my NLS_LANG variable as 'AMERICAN_AMERICA.AL32UTF8' in the perl file that connects to oracle and tries to insert the data. However when I insert a record with one value having this 'ñ' character the sql fails. But if I use 'Ñ' it inserts just fine. What am I doing wrong here?

Read the article
Why Solr admin query page interprets UTF-8 as ISO-8859-1

- by Scott Chu

I deploy a war to my Tomcat 6.0.35 on Win7 64bit and when I use full-interface query page (I mean form.jsp) in Solr Admin to query 2 Chinese character (say it's C1C2) , the debug info shows: <lst name="debug"> <str name="rawquerystring">æ°è</str> <str name="querystring">æ°è</str> <str name="parsedquery">NEWSID:æ°è</str> <str name="parsedquery_toString">NEWSID:æ°è</str> ... You can see C1C2 becomes æ°è. I deploy same war file to Tomcat on Linux or on another Win7 64bit of my colleagues' computer, the encoding acts well. Does anyone know why and how can I avoid this problem? Thanks in advance!

Read the article
Are UTF16 (as used by for example wide-winapi functions) characters always 2 byte long?

- by Cray

Please clarify for me, how does UTF16 work? I am a little confused, considering these points: There is a static type in C++, WCHAR, which is 2 bytes long. (always 2 bytes long obvisouly) Most of msdn and some other documentation seem to have the assumptions that the characters are always 2 bytes long. This can just be my imagination, I can't come up with any particular examples, but it just seems that way. There are no "extra wide" functions or characters types widely used in C++ or windows, so I would assume that UTF16 is all that is ever needed. To my uncertain knowledge, unicode has a lot more characters than 65535, so they obvisouly don't have enough space in 2 bytes. UTF16 seems to be a bigger version of UTF8, and UTF8 characters can be of different lengths. So if a UTF16 character not always 2 bytes long, how long else could it be? 3 bytes? or only multiples of 2? And then for example if there is a winapi function that wants to know the size of a wide string in characters, and the string contains 2 characters which are each 4 bytes long, how is the size of that string in characters calculated? Is it 2 chars long or 4 chars long? (since it is 8 bytes long, and each WCHAR is 2 bytes)

Read the article
GDM locale problems

- by Simón

I have two problems with GDM on Ubuntu 10.04. The first is with locales. In my system I have defined: $ cat /etc/environment PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games" LANG="es_ES.UTF-8" LANGUAGE="es_ES:es:en_US:en" $ cat /etc/default/locale LANG="es_ES.UTF-8" LANGUAGE="es_ES:es:en_US:en" $ cat /var/lib/locales/supported.d/local es_ES UTF-8 es_ES.UTF-8 UTF-8 en_US UTF-8 en_US.UTF-8 UTF-8 But when I enter in gnome desktop: $ locale LANG=es_ES LANGUAGE=es_ES:es:en_US:en LC_CTYPE="es_ES" LC_NUMERIC="es_ES" LC_TIME="es_ES" LC_COLLATE="es_ES" LC_MONETARY="es_ES" LC_MESSAGES="es_ES" LC_PAPER="es_ES" LC_NAME="es_ES" LC_ADDRESS="es_ES" LC_TELEPHONE="es_ES" LC_MEASUREMENT="es_ES" LC_IDENTIFICATION="es_ES" LC_ALL= I have deleted ~/.dmrc and I have restarted the system but nothing. GDM login screen also doesn't permit change this setting. However, in the text terminals (tty1,...): $ locale LANG=es_ES.UTF-8 LANGUAGE=es_ES:es:en_US:en LC_CTYPE="es_ES.UTF-8" LC_NUMERIC="es_ES.UTF-8" LC_TIME="es_ES.UTF-8" LC_COLLATE="es_ES.UTF-8" LC_MONETARY="es_ES.UTF-8" LC_MESSAGES="es_ES.UTF-8" LC_PAPER="es_ES.UTF-8" LC_NAME="es_ES.UTF-8" LC_ADDRESS="es_ES.UTF-8" LC_TELEPHONE="es_ES.UTF-8" LC_MEASUREMENT="es_ES.UTF-8" LC_IDENTIFICATION="es_ES.UTF-8" LC_ALL= The solution to problem is to edit .drmc file, but I think this isn't the right way. Why doesn't GDM read/apply the system locales? Why don't I see, in GDM login screen, the box to change the locale?

Read the article
How to remove invalid UTF-8 characters from a JavaScript string?

- by msielski

I'd like to remove all invalid UTF-8 characters from a string in JavaScript. I've tried using the approach described here (link removed) and came up with the JavaScript: strTest = strTest.replace(/([\x00-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3})|./, "$1"); It seems that the UTF-8 validation regex described here (link removed) is more complete and I adapted it in the same way like: strTest = strTest.replace(/([\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF][\x80-\xBF]{2}|[\xF1-\xF3][\x80-\xBF]{3}|\xF4[\x80-\x8F][\x80-\xBF]{2})|./, "$1"); Both of these pieces of code seem to be allowing valid UTF-8 through, but aren't filtering out hardly any of the bad UTF-8 characters from my test data: UTF-8 decoder capability and stress test. Either the bad characters come through unchanged or seem to have some of their bytes removed creating a new, invalid character. I'm not very familiar with the UTF-8 standard or with multibyte in JavaScript so I'm not sure if I'm failing to represent proper UTF-8 in the regex or if I'm applying that regex improperly in JavaScript. Any help appreciated. Thanks!

Read the article
UTF-8 encoding problem with flash mysql and php

- by alibhp

Hi, As you may know, I am programming an on-line game using FLASH. I am connecting my FLASH 8 movie with MySQL database through PHP. I am doing very good in that, and I have everything working fine. The problems come when I am trying to insert (Using the INSERT SQL func) data to the database that are non-english. In other words, UTF-8 data. I red a lot of articls about that stuff and found and apply the fallowing: 1. In PHP4, you need to tell the PHP to use UTF-8 when using the xml_parser_crater() func, however, in PHP5 that is done automatically. Even though I told PHP5 to use the UTF-8 when calling the func. Adding the header to the XML sent to PHP from flash. Force the FLASH to use UTF-8 encoding in the preference options. Set the encoding in MySQL to UTF-8 (utf8_unicode_ci with InnoDB engine). I can read and insert the other language data correctly in the phpadmin as well. I did all that in my coding, and still I can't insert such data. one more strange thing is that, when I use the same link, that the FLASH using, with the XML, that the FLASH creating, on the browser (google chrome), I got the data inserted right in the database!!!!! I am about to get crazy about that stuff, What am I missing? what cause the problem? Thank you in advance.

Read the article
How to generate real UTF-8 XML with grails without the escape characters?

- by AngeDeLaMort

I have been wondering why when I set the encoding to UTF-8 and rendering the XML it replace the extended characters by escape characters (or character reference) like ’ instead of '? I'm using the Render method render(contentType:"text/xml", encoding:"UTF-8") {...} with a proper header render(contentType:"text/xml", encoding:"UTF-8", text:"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n") Any idea if there is a way to write it properly? Thanks.

Read the article
Why isn't UTF-8 allowed as the "ANSI" code page?

- by dan04

The Windows _setmbcp function allows any valid code page... (except UTF-7 and UTF-8, which are not supported) OK, not supporting UTF-7 makes sense: Characters have non-unique representations and that introduces complexity and security risks. But why not UTF-8? As I understand it, the "ANSI" versions of the Windows API functions convert their arguments to UTF-16, call the equivalent "W" function, and convert any strings in the output to "ANSI". This is what I've been doing manually. So why can't Windows do it for me?

Read the article

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >