utf 32 - Page 18 - Developer IT

\w in PHP preg_replace covers only second byte of UTF-8 chars

- by Andrey

we have this code: $value = preg_replace("/[^\w]/", '', $value); where $value is in utf-8. After this transformation first byte of multibyte characters is stripped. How to make \w cover UTF-8 chars completely? Sorry, i am not very well in PHP

Read the article

remove utf-8 figure spaces with php

- by Jeroen Beerstra

I have some xml files with figure spaces in it, I need to remove those with php. The utf-8 code for these is e2 80 a9. If I'm not mistaken php does not seem to like 6 byte utf-8 chars, so far at least I'm unable to find a way to delete the figure spaces with functions like preg_replace. Anybody any tips or even better a solution to this problem?

Read the article

Change encoding to UTF-8 recursively on Windows?

- by Pekka

Does anybody know a tool, preferably for the Explorer context menu, to recursively change the encoding of files in a project from / to UTF-8 and other encodings? Freeware or not too expensive would be great. Edit: Thanks for the answers, +1 for all of them as they are all fine but I am a lazy bastard sometimes, and would really like to be able to just right click a folder and say "convert all .php files to UTF-8". :) Further suggestions are appreciated.

Read the article

Java: Converting UTF 8 to String

- by kujawk

When I run the following program: public static void main(String args[]) throws Exception { byte str[] = {(byte)0xEC, (byte)0x96, (byte)0xB4}; String s = new String(str, "UTF-8"); } on Linux and inspect the value of s in jdb, I correctly get: s = "ì–´" on Windows, I incorrectly get: s = "?" My byte sequence is a valid UTF-8 character in Korean, why would it be producing two very different results?

Read the article

C++ UTF-8 lightweight & permissive code?

- by xenthral

Anyone know of a more permissive license (MIT / public domain) version of this: http://library.gnome.org/devel/glibmm/unstable/classGlib_1_1ustring.html ('drop-in' replacement for std::string thats UTF-8 aware) Lightweight, does everything I need and even more (doubt I'll use the UTF-XX conversions even) I really don't want to be carrying ICU around with me.

Read the article

Can str_replace be safely used on a UTF-8 encoded string if it's only given valid UTF-8 encoded stri

- by Manos Dilaverakis

PHP's str_replace() was intended only for ANSI strings and as such can mangle UTF-8 strings. However, given that it's binary-safe would it work properly if it was only given valid UTF-8 strings as arguments? Edit: I'm not looking for a replacement function, I would just like to know if this hypothesis is correct.

Read the article

strnicmp equivalent for UTF-8?

- by Jen

What do I use to perform a case-insensitive comparison on two UTF-8 encoded sub-strings? Essentially, I'm looking for a strnicmp function for UTF-8.

Read the article

replacing characters with UTF-8 after using mysql_set_charset('utf8') function

- by Ahmet vardar

I converted all mysql tables to utf-8_unicode and started using mysql_set_charset('utf8'); function. But after this, some characters like S, Ö started looking like Ã– , Åž How can i replace this kinda letters in mysql with UTF-8 format ? shortly, can i find a list of all these kinda characters to replace ? EDIT: He is explaining about this issue in this article actually but i cannot understand it properly acutally lol http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html

Read the article

Build 32-bit with 64-bit llvm-gcc

- by Jay Conrod

I have a 64-bit version of llvm-gcc, but I want to be able to build both 32-bit and 64-bit binaries. Is there a flag for this? I tried passing -m32 (which works on the regular gcc), but I get an error message like this: [jay@andesite]$ llvm-gcc -m32 test.c -o test Warning: Generation of 64-bit code for a 32-bit processor requested. Warning: 64-bit processors all have at least SSE2. /tmp/cchzYo9t.s: Assembler messages: /tmp/cchzYo9t.s:8: Error: bad register name `%rbp' /tmp/cchzYo9t.s:9: Error: bad register name `%rsp' ... This is backwards; I want to generate 32-bit code for a 64-bit processor! I'm running llvm-gcc 4.2, the one that comes with Ubuntu 9.04 x86-64. EDIT: Here is the relevant part of the output when I run llvm-gcc with the -v flag: [jay@andesite]$ llvm-gcc -v -m32 test.c -o test.bc Using built-in specs. Target: x86_64-linux-gnu Configured with: ../llvm-gcc4.2-2.2.source/configure --host=x86_64-linux-gnu --build=x86_64-linux-gnu --prefix=/usr/lib/llvm/gcc-4.2 --enable-languages=c,c++ --program-prefix=llvm- --enable-llvm=/usr/lib/llvm --enable-threads --disable-nls --disable-shared --disable-multilib --disable-bootstrap Thread model: posix gcc version 4.2.1 (Based on Apple Inc. build 5546) (LLVM build) /usr/lib/llvm/gcc-4.2/libexec/gcc/x86_64-linux-gnu/4.2.1/cc1 -quiet -v -imultilib . test.c -quiet -dumpbase test.c -m32 -mtune=generic -auxbase test -version -o /tmp/ccw6TZY6.s I looked in /usr/lib/llvm/gcc-4.2/libexec/gcc hoping to find another binary, but the only directory there is x86_64-linux-gnu. I will probably look at compiling llvm-gcc from source with appropriate options next.

Read the article

32 bit depth jpg images problem in IE when referenced locally

- by Stefan

We have an webbapplication that takes an image that will be uploaded and resized. The resize-library we used saved all pictures with 32-bit depth whatever the depth was before. We have an online client that can view the pictures via an html-file and all is fine there. All pictures are shown correctly. The problem: We also have an vb-winform application that download the pictures and show them in an html-file locally in an webbrowser control. But here all pictures are rejected (not rendered), just the red cross. If we create an static html-file with img-tags in them locally, its the same. All pictures that has 32-bits depth are shown as red crosses. If we resave the pictures with 24-bits depth it magically works again. So ofcourse that was our "workaround", let the resize-library save all pictures with 24-bits depth instead. Summary: 32-bits jpg files shows correct in IE when online but not when referenced locally in a local html-file. (This is true for IE8 on both winxp and windows7). The same local html-file opened in mozilla showed OK. Question: I have googled this a lot but has not found anything about this "problem". Is this a bug in IE8?

Read the article

Compiling 32-bit Program on VS 2008

- by gordonwd

I've been developing on VC++ 2003 on an XP PC but am now on Windows 7 and bought a cheap legal copy of VS 2008 to continue work on the same project. My product has to continue to run on customers' XP systems, so I'm strictly interested in a 32-bit executable. The first issue I ran into was the PRJ0003 error "spawning cl.exe". I had to add the path to this file to the VC++ Directories settings (it appears in both a bin\amd64 and bin\x86_amd64 directory, but I don't think it matters output-wise which I use?). The issue I now have (not counting a tedious cleanup to convert strcpy to strcpy_s, etc.) is that I'm not clear on whether I'm generating a 32-bit or 64-bit exe out of this. My project properties are set to a target of "Win32", so I assume that all is well. Is this correct? I have read some discussions about this, but it's never quite clear if they are talking about whether the compiler itself is running x64 vs. x86, or whether the compiled code is x64 vs. x86, and how this is differentiated. So am I doing the right thing to generate a 32-bit, Win32, x-86 program?

Read the article

Can I convert my database/script to UTF-8 ?

- by Mohannad Otaibi

How can I convert a database to support UTF-8 and convert it's old data from what ever encoding they're in to UTF-8 ? Extra Info: I'm running a server which has many websites on it, and one of them is running WHMCS (php script to manage hosting clients). WHMCS has an iPhone application where i can browse it through iPhone, the problem is that this application will only run if everything in my website is in UTF-8 encoding. I was using windows-1256 as encoding in my script's settings, and i tried changing that in some point of time to UTF-8 for a while then changed it back to windows-1256 so, the data in the database are some inserted using UTF standards and most of them are windows-1256 If someone could clear the picture for me, Do I need to convert every database on the server or just one DB ? what should I change? If i had to do that manually, I'll do it but I need some expert advise.

Read the article

Is PHP serialize function compatible UTF-8 ?

- by Matthieu

I have a site I want to migrate from ISO to UTF-8. I have a record in database indexed by the following primary key : s:22:"Informations générales"; The problem is, now (with UTF-8), when I serialize the string, I get : s:24:"Informations générales"; (notice the size of the string is now the number of bytes, not string length) So this is not compatible with non-utf8 previous records ! Did I do something wrong ? How could I fix this ? Thanks

Read the article

Problem with UTF-8

- by Pablo Fernandez

I'm using castor as an OXM mapper, and I'm having a problem with UTF-8 encoding. The code here shows the issue: //Marshaller configuration ByteArrayOutputStream baos = new ByteArrayOutputStream(); OutputStreamWriter os = new OutputStreamWriter(baos, UTF_8); Marshaller marshaller = new Marshaller(os); marshaller.setSuppressXSIType(true); //Mappings configuration Mapping map = new Mapping(); map.loadMapping(MarshallingService.class.getResource(MAPPINGS_PATH)); marshaller.setMapping(map); //Example //BEFORE MARSHALLING: This prints correctly the UTF-8 Chars object.getName() ; marshaller.marshal(object); //AFTER MARSHALLING: This returns the characters like \435\235\654\345 return baos.toString(UTF_8);

Read the article

Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte

- by luckylak

Hi, I am trying to convert a string encoded in java in UTF-8 to ISO-8859-1. Say for example, in the string 'âabcd' 'â' is represented in ISO-8859-1 as E2. In UTF-8 it is represented as two bytes. C3 A2 I believe. When I do a getbytes(encoding) and then create a new string with the bytes in ISO-8859-1 encoding, I get a two different chars. Ã¢. Is there any other way to do this so as to keep the character the same i.e. âabcd?

Read the article

Display EURO Symbol in UTF-8 Format

- by StevieB

Im wondering how I can display a Euro symbol in UTF-8 Format ? Cheers

Read the article

trouble with utf-8 chars & apache2 rewrite rules

- by tixrus

I see the post http://stackoverflow.com/questions/2565864/validating-utf-8-in-htaccess-rewrite-rule and I think that is great, but a more fundamental problem I am having first: I needed to expand to handle utf-8 chars for query string parameters, names of directories, files, and used in displays to users etc. I configured my Apache with DefaultCharset utf-8 and also my php if that matters. My original rewrite rule filtered everything except regular A-Za-z and underscore and hyphen. and it worked. Anything else would give you a 404 (which is what I want!) Now, however it seems that everything matches, including stuff I don't want, however, although it seems to match it doesn't go in the query string unless it is a regular A-Za-z_- character string. I find this confusing, because the rule says put whatever you matched into the query string: Here is the original rule: RewriteRule ^/puzzle/([A-Za-z_-]+)$ /puzzle.php?g=$1 [NC] and here is the revised rule: RewriteRule ^/puzzle/(\w+)$ /puzzle.php?g=$1 [NC] I made the change because somewhere I read that \w matches ALL the alpha chars where as A-Zetc. only matches the ones without accents and stuff. It doesn't seem to matter which of those rules I use: Here is what happens: In the application I have this: echo $_GET['g']; If I feed it a url like http://mydomain.com/puzzle/USA it echoes out "USA" and works fine. If I feed it a url like http://mydomain.com/puzzle/México it echoes nothing for that and warns me that index g is not defined and of course doesn't get resources for Mexico. if I feed it a url like http://mydomain.com/puzzle/fuzzle/buzzle/j.qle it does the same thing. This last case should be a 404! And it does this no matter which of the above rules I use. I configured a rewrite log RewriteLogLevel 5 RewriteLog /opt/local/apache2/logs/puzzles.httpd.rewrite but it is empty. Here is from the regular access log (it gives a status of 200) [26/May/2010:11:21:42 -0700] "GET /puzzle/M%C3%A9xico HTTP/1.1" 200 342 [26/May/2010:11:21:54 -0700] "GET /puzzle/M/l.foo HTTP/1.1" 200 342 What can I do to get these $%#$@(*#@!!! characters but not slash, dot or other non-alpha into my program, and once there, will it decode them correctly??? Would posix char classes work any better? Is there anything else I need to configure?

Read the article

Lost in UTF-8 hell. (Django and Python)

- by user140314

I am working through the Django RSS reader project here. The RSS feed will read something like "OKLAHOMA CITY (AP) — James Harden let". The RSS feed's encoding reads encoding="UTF-8" so I believe I am passing utf-8 to markdown in the code snippet below. The em dash is where it chokes. I get the Django error of "'ascii' codec can't encode character u'\u2014' in position 109: ordinal not in range(128)" which is an UnicodeEncodeError. In the variables being passed I see "OKLAHOMA CITY (AP) \u2014 James Harden". The code line that is not working is: content = content.encode(parsed_feed.encoding, "xmlcharrefreplace") I am using markdown 2.0, django 1.1, and python 2.4. What is the magic sequence of encoding and decoding that I need to do to make this work? Thanks.

Read the article

PHP: Replace umlauts with closest 7-bit ASCII equivalent in an UTF-8 string

- by BlaM

What I want to do is to remove all accents and umlauts from a string, turning "lärm" into "larm" or "andré" into "andre". What I tried to do was to utf8_decode the string and then use strtr on it, but since my source file is saved as UTF-8 file, I can't enter the ISO-8859-15 characters for all umlauts - the editor inserts the UTF-8 characters. Obviously a solution for this would be to have an include that's an ISO-8859-15 file, but there must be a better way than to have another required include? echo strtr(utf8_decode($input), 'ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ', 'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy'); UPDATE: Maybe I was a bit inaccurate with what I try to do: I do not actually want to remove the umlauts, but to replace them with their closest "one character ASCII" aequivalent.

Read the article

Efficient way to ASCII encode UTF-8

- by Andreas Gohr

I'm looking for a simple and efficient way to store UTF-8 strings in ASCII-7. With efficient I mean the following: all ASCII chars in the input should stay ASCII chars in the output the resulting string should be as short as possible the operation needs to be reversable without any data loss there should be no restriction on the input length the whole UTF-8 range should be allowed My first idea was to use Punycode (IDNA) as it fits the first three requirements, but it fails at the last two. Can anyone recommend an alternative encoding scheme? Even better if there's some code available to look at.

Read the article

Reading from the HTML DOM returns UTF-8 characters

- by teehoo

I have a contenteditable div where I'm reading individual characters and sending them off to a server (for more background this is similar to Google Wave where typing a character automatically sends it) I was using a plain old html textfield before and everything worked fine until I "upgraded" to a contenteditable div. My problem is that now the characters are in UTF-8 format, which is causing some weird problems on the server that I would rather not debug. It would be much easier to force everything to be ASCII on the client side. Is there any way to do this? I tried putting in a meta tag stating the html file is charset=ISO-8859-1, but it doesnt seem to work. Reading from the div tag still returns UTF-8 codes. (One example is when I press space I get the pair 0xC2 0xA0 which corresponds to a "non-breaking white space"

Read the article

Forcing a mixed ISO-8859-1 and UTF-8 multi-line string into UTF-8 in Perl

- by knorv

Consider the following problem: A multi-line string $junk contains some lines which are encoded in UTF-8 and some in ISO-8859-1. I don't know a priori which lines are in which encoding, so heuristics will be needed. I want to turn $junk into pure UTF-8 with proper re-encoding of the ISO-8859-1 lines. Also, in the event of errors in the processing I want to provide a "best effort result" rather than throwing an error. My current attempt looks like this: $junk = &force_utf8($junk); sub force_utf8 { my $input = shift; my $output = ''; foreach my $line (split(/\n/, $input)) { if (utf8::valid($line)) { utf8::decode($line); } $output .= "$line\n"; } return $output; } While this appears to work I'm certain this is not the optimal solution. How would you improve the force_utf8(...) sub?

Read the article

Python: UTF-8 problems (again...)

- by blahblah

I have a database which is synchronized against an external web source twice a day. This web source contains a bunch of entries, which have names and some extra information about these names. Some of these names are silly and I want to rename them when inserting them into my own database. To rename these silly names, I have a standard dictionary as such: RENAME_TABLE = { "Wsird" : "Weird", ... } As you can see, this is where UTF-8 comes into play. This is the function which performs renaming of all the problematic entries: def rename_all_entries(): all_keys = RENAME_TABLE.keys() entries = Entry.objects.filter(name__in=all_keys) for entry in entries: entry.name = RENAME_TABLE[entry.name] entry.save() So it tries to find the old name in RENAME_TABLE and renames the entry if found. However, I get a KeyError exception when using RENAME_TABLE[entry.name]. Now I'm lost, what do I do? I have... # -*- coding: utf-8 -*- ...in the top of the Python file.

Read the article

Global UTF-encoding, the right way

- by mowgli

I'm curious, as to what is the right way to have UTF-8 encoding on all web files All my files (incl. CSS and JS) are made and saved in UTF-8 encoding In PHP, I set the char-set on top of the main page (this page includes all others) with: header('Content-type: text/html; charset=utf-8'); In the same page I have this html meta tag: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> Then I stubled upon an external css file that has this on first line: @charset "UTF-8"; And now I wonder, should I set the charset INSIDE all my CSS/JS files too, like that? And/or should I serve each file with charset=utf-8 in the meta tag?

Read the article

Should UTF-16 be considered harmful?

- by Artyom

I'm going to ask what is probably quite a controversial question: "Should one of the most popular encodings, UTF-16, be considered harmful?" Why do I ask this question? How many programmers are aware of the fact that UTF-16 is actually a variable length encoding? By this I mean that there are code points that, represented as surrogate pairs, take more than one element. I know; lots of applications, frameworks and APIs use UTF-16, such as Java's String, C#'s String, Win32 APIs, Qt GUI libraries, the ICU Unicode library, etc. However, with all of that, there are lots of basic bugs in the processing of characters out of BMP (characters that should be encoded using two UTF-16 elements). For example, try to edit one of these characters: 𝄞 (U+1D11E) MUSICAL SYMBOL G CLEF 𝕥 (U+1D565) MATHEMATICAL DOUBLE-STRUCK SMALL T 𝟶 (U+1D7F6) MATHEMATICAL MONOSPACE DIGIT ZERO 𠂊 (U+2008A) Han Character You may miss some, depending on what fonts you have installed. These characters are all outside of the BMP (Basic Multilingual Plane). If you cannot see these characters, you can also try looking at them in the Unicode Character reference. For example, try to create file names in Windows that include these characters; try to delete these characters with a "backspace" to see how they behave in different applications that use UTF-16. I did some tests and the results are quite bad: Opera has problem with editing them (delete required 2 presses on backspace) Notepad can't deal with them correctly (delete required 2 presses on backspace) File names editing in Window dialogs in broken (delete required 2 presses on backspace) All QT3 applications can't deal with them - show two empty squares instead of one symbol. Python encodes such characters incorrectly when used directly u'X'!=unicode('X','utf-16') on some platforms when X in character outside of BMP. Python 2.5 unicodedata fails to get properties on such characters when python compiled with UTF-16 Unicode strings. StackOverflow seems to remove these characters from the text if edited directly in as Unicode characters (these characters are shown using HTML Unicode escapes). WinForms TextBox may generate invalid string when limited with MaxLength. It seems that such bugs are extremely easy to find in many applications that use UTF-16. So... Do you think that UTF-16 should be considered harmful?

Search Results

Search found 11264 results on 451 pages for 'utf 32'.

Page 18/451 | < Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >

- by Andrey

- by Jeroen Beerstra

- by Pekka

- by kujawk

- by xenthral

- by Manos Dilaverakis

- by Jen

- by Ahmet vardar

- by Jay Conrod

- by Stefan

- by gordonwd

- by Mohannad Otaibi

- by Matthieu

- by Pablo Fernandez

- by luckylak

- by StevieB

- by tixrus

- by user140314

- by BlaM

- by Andreas Gohr

- by teehoo

- by knorv

- by blahblah

- by mowgli

- by Artyom

< Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >