Search Results

Search found 4604 results on 185 pages for 'utf'.

Page 3/185 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Convert ISO-8859-1 to UTF-8

- by tau

I have several documents I need to convert from ISO-8859-1 to UTF-8 (without the BOM of course). This is the issue though. I have so many of these documents (it is actually a mix of documents, some UTF-8 and some ISO-8859-1) that I need an automated way of converting them. Unfortunately I only have ActivePerl installed and don't know much about encoding in that language. I may be able to install PHP, but I am not sure as this is not my personal computer. Just so you know, I use Scite or Notepad++, but both do not convert correctly. For example, if I open a document in Czech that contains the character "ž" and go to the "Convert to UTF-8" option in Notepad++, it incorrectly converts it to an unreadable character. There is a way I CAN convert them, but it is tedious. If I open the document with the special characters and copy the document to Windows clipboard, then paste it into a UTF-8 document and save it, it is okay. This is too tedious (opening every file and copying/pasting into a new document) for the amount of documents I have. Any ideas? Thanks!!!

Read the article
PHP Streaming CSV always adds UTF-8 BOM

- by Mustafa Ashurex

The following code gets a 'report line' as an array and uses fputcsv to tranform it into CSV. Everything is working great except for the fact that regardless of the charset I use, it is putting a UTF-8 bom at the beginning of the file. This is exceptionally annoying because A) I am specifying iso and B) We have lots of users using tools that show the UTF-8 bom as characters of garbage. I have even tried writing the results to a string, stripping the UTF-8 BOM and then echo'ing it out and still get it. Is it possible that the issue resides with Apache? If I change the fopen to a local file it writes it just fine without the UTF-8 BOM. header("Content-type: text/csv; charset=iso-8859-1"); header("Cache-Control: no-store, no-cache"); header("Content-Disposition: attachment; filename=\"report.csv\""); $outstream = fopen("php://output",'w'); for($i = 0; $i < $report-rowCount; $i++) { fputcsv($outstream, $report-getTaxMatrixLineValues($i), ',', '"'); } fclose($outstream); exit;

Read the article
Reading UTF-8 XML and writing it to a file with Python

- by Harri

I'm trying to parse UTF-8 XML file and save some parts of it to another file. Problem is, that this is my first Python script ever and I'm totally confused about the character encoding problems I'm finding. My script fails immediately when it tries to write non-ascii character to a file, but it can print it to command prompt (at least in some level) Here's the XML (from the parts that matter at least, it's a *.resx file which contains UI strings) <?xml version="1.0" encoding="utf-8"?> <root> <resheader name="foo"> <value>bar</value> </resheader> <data name="lorem" xml:space="preserve"> <value>ipsum öä</value> </data> </root> And here's my python script from xml.dom.minidom import parse names = [] values = [] def getStrings(path): dom = parse(path) data = dom.getElementsByTagName("data") for i in range(len(data)): name = data[i].getAttribute("name") names.append(name) value = data[i].getElementsByTagName("value") values.append(value[0].firstChild.nodeValue.encode("utf-8")) def writeToFile(): with open("uiStrings-fi.py", "w") as f: for i in range(len(names)): line = names[i] + '="'+ values[i] + '"' #varName='varValue' f.write(line) f.write("\n") getStrings("ResourceFile.fi-FI.resx") writeToFile() And here's the traceback: Traceback (most recent call last): File "GenerateLanguageFiles.py", line 24, in writeToFile() File "GenerateLanguageFiles.py", line 19, in writeToFile line = names[i] + '="'+ values[i] + '"' #varName='varValue' UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in ran ge(128) How should I fix my script so it would read and write UTF-8 characters properly? The files I'm trying to generate would be used in test automation with Robots Framework.

Read the article
Problems with utf-8 encoding in php

- by Addsy

Hi, Another utf-8 related problem I believe... I am using php to update data in a mysql db then display that data elsewhere in the site. Previously I have run into utf-8 problems before where special characters are displayed as question marks when viewed in a browser but this one seems slightly different. I have a number of records to enter that contain the è character. If I enter this directly in the db then it appears correctly on the page so I take this to mean that utf-8 content is being output correctly. However when I try and update the values in the db through php, then the è character is replaced. What appears instead is & Atilde ; & uml ; (without the spaces) which appears in the browser as Ã¨ I have the tables in the database set to use UTF-8. I believe this is correct cos, as mentioned, if I update the db through phpMyAdmin, its all ok. Similarly I have set the character encoding for the page which seems to be correct. I am also running the sql statement "SET NAMES 'utf8';" before trying to update the db. Anyone have any other ideas as to where the problem may lie? Many thanks

Read the article
UTF-8 HTML and CSS files with BOM (and how to remove the BOM with Python)

- by Cameron

First, some background: I'm developing a web application using Python. All of my (text) files are currently stored in UTF-8 with the BOM. This includes all my HTML templates and CSS files. These resources are stored as binary data (BOM and all) in my DB. When I retrieve the templates from the DB, I decode them using template.decode('utf-8'). When the HTML arrives in the browser, the BOM is present at the beginning of the HTTP response body. This generates a very interesting error in Chrome: Extra <html> encountered. Migrating attributes back to the original <html> element and ignoring the tag. Chrome seems to generate an <html> tag automatically when it sees the BOM and mistakes it for content, making the real <html> tag an error. So, using Python, what is the best way to remove the BOM from my UTF-8 encoded templates (if it exists -- I can't guarantee this in the future)? For other text-based files like CSS, will major browsers correctly interpret (or ignore) the BOM? They are being sent as plain binary data without .decode('utf-8'). Note: I am using Python 2.5. Thanks!

Read the article
Can't store UTF-8 in RDS despite setting up new Parameter Group using Rails on Heroku

- by Lail

I'm setting up a new instance of a Rails(2.3.5) app on Heroku using Amazon RDS as the database. I'd like to use UTF-8 for everything. Since RDS isn't UTF-8 by default, I set up a new Parameter Group and switched the database to use that one, basically per this. Seems to have worked: SHOW VARIABLES LIKE '%character%'; character_set_client utf8 character_set_connection utf8 character_set_database utf8 character_set_filesystem binary character_set_results utf8 character_set_server utf8 character_set_system utf8 character_sets_dir /rdsdbbin/mysql-5.1.50.R3/share/mysql/charsets/ Furthermore, I've successfully setup Heroku to use the RDS database. After rake db:migrate, everything looks good: CREATE TABLE `comments` ( `id` int(11) NOT NULL AUTO_INCREMENT, `commentable_id` int(11) DEFAULT NULL, `parent_id` int(11) DEFAULT NULL, `content` text COLLATE utf8_unicode_ci, `child_count` int(11) DEFAULT '0', `created_at` datetime DEFAULT NULL, `updated_at` datetime DEFAULT NULL, PRIMARY KEY (`id`), KEY `commentable_id` (`commentable_id`), KEY `index_comments_on_community_id` (`community_id`), KEY `parent_id` (`parent_id`) ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; In the markup, I've included: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> Also, I've set: production: encoding: utf8 collation: utf8_general_ci ...in the database.yml, though I'm not very confident that anything is being done to honor any of those settings in this case, as Heroku seems to be doing its own config when connecting to RDS. Now, I enter a comment through the form in the app: "Úbe® ƒåiL", but in the database I've got "ÃšbeÂ® Æ’Ã¥iL" It looks fine when Rails loads it back out of the database and it is rendered to the page, so whatever it is doing one way, it's undoing the other way. If I look at the RDS database in Sequel Pro, it looks fine if I set the encoding to "UTF-8 Unicode via Latin 1". So it seems Latin-1 is sneaking in there somewhere. Somebody must have done this before, right? What am I missing?

Read the article
Creating UTF-8 files in Java from a runnable Jar

- by RuntimeError

I have a little Java project where I've set the properties of the class files to UTF-8 (I use a lot of foreign characters not found on the default CP1252). The goal is to create a text file (in Windows) containing a list of items. When running the class files from Eclipse itself (hitting Ctrl+F11) it creates the file flawlessly and opening it in another editor (I'm using Notepad++) I can see the characters as I wanted. +--------------------------------------------------+ ¦ Universidade2010 (18/18)¦ ¦ hidden: 0¦ +--------------------------------------------------¦ But, when I export the project (using Eclipse) as a runnable Jar and run it using 'javaw -jar project.jar' the new file created is a mess of question marks ???????????????????????????????????????????????????? ? Universidade2010 (19/19)? ? hidden: 0? ???????????????????????????????????????????????????? I've followed some tips on how to use UTF-8 (which seems to be broken by default on Java) to try to correct this so now I'm using Writer w = new OutputStreamWriter(fos, "UTF-8"); and writing the BOM header to the file like in this question already answered but still without luck when exporting to Jar Am I missing some property or command-line command so Java knows I want to create UTF-8 files by default ?

Read the article
Problem with PHP localeconv() - Maybe UTF-8

- by Chuck Ugwuh

I'm having an issue with the localeconv() in PHP. I'm using a Windows PC. I set my locale to France using setLocale(LC_ALL, 'fra_fra') function. Then I call the localeconv() function to a variable. When I output that variable, below is what I get. Array ( [decimal_point] = , [thousands_sep] = ? [int_curr_symbol] = EUR [currency_symbol] = ? [mon_decimal_point] = , [mon_thousands_sep] = ? [positive_sign] = [negative_sign] = - [int_frac_digits] = 2 [frac_digits] = 2 [p_cs_precedes] = 0 [p_sep_by_space] = 1 [n_cs_precedes] = 0 [n_sep_by_space] = 1 [p_sign_posn] = 1 [n_sign_posn] = 1 [grouping] = Array ( [0] = 3 ) [mon_grouping] = Array ( [0] = 3 ) ) I'm not sure if it is a UTF-8 display issue. I've done the following: Set my default_charset in PHP.ini to UTF-8 The Content-type on my page is UTF-8 I've also called same in a header i.e. header('Content-type: text/html; charset=utf-8') I'm using firefox and changed the charset there too, still no luck I also updated my http.conf file with AddDefaultCharset, but still no cigar I'm completely stumped and not sure what next to do. Can anyone help out? Thanks.

Read the article
Java servlet and UTF-8 problem

- by Gabriele

I have some problem with UTF-8. My client (realized in GWT) make a request to my servlet, with some parametres in the URL, as follow: http://localhost:8080/servlet?param=value When in the servlet I retrieve the URL, I have some problem with UTF-8 characters. I use this code: protected void service(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setCharacterEncoding("UTF-8"); String reqUrl = request.getRequestURL().toString(); String queryString = request.getQueryString(); System.out.println("Request: "+reqUrl + "?" + queryString); ... So, if I call this url: http://localhost:8080/servlet?param=così the result is like this: Request: http://localhost:8080/servlet?param=cos%C3%AC What can I do to set up properly the character encoding?

Read the article
Some special characters defined in "ISO-8859-1" can't be shown when encoding with "UTF-8"

- by Mike.Huang

I need to get a string from URL request of brower, and then create a text image by requested text. I know the default encoding of the Java net transmission is "ISO-8859-1", it can works normally with all characters what defined in "ISO-8859-1". But when I request a multi-byte Unicode character (e.g. chinese or something like ¤?), then I need to decode it by "UTF-8" from "ISO-8859-1". My codes like: String reslut = new String(requestString.getBytes("ISO-8859-1"), "UTF-8"); Everything is fine, but I found some characters in ISO-8859-1 are not been shown now, which characters are 0x80 - 0xFF(defined in" ISO-8859-1"), i.e. the characters after 0x80 (in "ISO-8859-1") not been shown when converted to "UTF-8" from "ISO-8859-1". Any other method can solve this query?

Read the article
gcc, UTF-8 and limits.h

- by bobby

My OS is Debian, my default locale is UTF-8 and my compiler is gcc. By default CHAR_BIT in limits.h is 8 which is ok for ASCII because in ASCII 1 char = 8 bits. But since I am using UTF-8, chars can be up to 32 bits which contradicts the CHAR_BIT default value of 8. If I modify CHAR_BIT to 32 in limits.h to better suit UTF-8, what do I have to do in order for this new value to come into effect ? I guess I have to recompile gcc ? Do I have to recompile the linux kernel ? What about the default installed Debian packages, will they work ?

Read the article
Velocity templates seem to fail with UTF-8

- by steve

Hi, i have been trying to use a velocity Template with the following content: Sübjäct $item everything works fine except the translation of the tow unicode characters. The result string printed on the commandline looks like: SÃ¼bjÃ¤ct foo I searched the velocity website and the web an this issue, and came uo with differnt font encoding options, which i added to my code. But those won't help. This is the actuall code: velocity.setProperty("file.resource.loader.path", absPath); velocity.setProperty("input.encoding", "UTF-8"); velocity.setProperty("output.encoding", "UTF-8"); Template t = velocity.getTemplate("subject.vm"); t.setEncoding("UTF-8"); StringWriter sw = new StringWriter(); t.merge(null, sw); System.out.println(sw.getBuffer()); Can anyone give me some hints, how to fix this issue?

Read the article
communicate with a process in utf-8 on a cp1252 consoless

- by Mapad

I need to control a program by sending commands in utf-8 encoding to its standard input. For this I run the program using subprocess.Popen(): proc = Popen("myexecutable.exe", shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE) proc.stdin.write(u'ééé'.encode('utf_8')) If I run this from a cygwin utf-8 console, it works. If I run it from a windows console (encoding ='cp1252') this doesn't work. Is there a way to make this work without having to install a cygwin utf-8 console on each computer I want it to run from ? (NB: I don't need to output anything to console)

Read the article
Debian, How to convert filesystem from ISO-8859-1 into UTF-8?

- by Johan

I have a old pc that is running Debian stable, that is in need of a upgrade. The problem is that it is using latin1 (ISO-8859-1) for everything, and since the rest of the world has moved to UTF-8 I plan to convert this computer as well. And for this question I will focus in on the files that are served with Samba, and some has some latin1 characters in the filenames (like åäö). Now my plan is to move all data of this old computer onto and a brand new one that is running Debian stable (but with UTF-8). Does anybody have a good idea? Thanks Johan Note: later I plan to use iconv to convert the content of some files with something like this: iconv --from-code=ISO-8859-1 --to-code=UTF-8 iso.txt > utf.txt However I don't know of a good way to convert the filesystem it self. Note: Normally I usaly just scp from one computer to the next, but then I end up with latin1 characters in the utf-8 filesystem... Update: Did a small test round with a hand full of files (with funny chars) in the filenames, and that seemed like it could work. convmv -r -f ISO-8859-1 -t UTF-8 * So it was only to execute with the --notest convmv -r -f ISO-8859-1 -t UTF-8 --notest * Nothing more to it.

Read the article
UTF-8 Database Problem

- by Danten

I've a MySQL table that has a UTF-8 charset and upon attempting to insert to it via a PHP form, the database gives the following error: PDOStatement::execute(): SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xE8' for column ... The character in question is 'è', yet I don't see why this should be a problem considering the database and table are set to UTF-8.

Read the article
How convert PHP value from windows-1257 to UTF-8

- by someonewhodoesintspeakenglish

How convert PHP value from windows-1257 to UTF-8? I tried many ways, but they was not successful. I have lttu?s and I wanna convert this to littus. utf8_encode(); iconv_set_encoding("windows-1257", "UTF-8"); mb_convert_encoding() Doesn't work. :( Can anybody help me?

Read the article
validating utf-8 in htaccess rewrite rule

- by TrustWeb

i validate urls with utf-8 characters with a rewrite rule RewriteRule ^([a-z]{2})/([a-z0-9-]{1,256})/([[:print:]]{1,256})$ index.php?language=$1&categories=$2&get_query=$3 [L] $get_query is the point, this accepts: test!?!'"<*+ but fails for accented chars as àèéìòù, or other utf-8 for example in wikipedia this works great: http://en.wikipedia.org/wiki/%E6%B1%89%E8%AF%AD_%E6%BC%A2%E8%AA%9E any help? :-)

Read the article
Setting Quercus db connection encoding to UTF-8 (urgent problem and need your great help)

- by sokcmss

Now we are going to use java class in my website developed with PHP + mySQL. I came to know Quercus and it worked well. But only problem is encoding. Quercus is providing ISO8859 encoding in default and all database in UTF-8 is not shown properly like ???. If anybody know the way to set Quercus db connection encoding to UTF-8, please help me. Look forward to hearing good news urgently.

Read the article
Different between Mysql Collation "Utf-8-general-ci" & Utf-8-unicode-ci ?

- by kokogyi

Dear All, What is the Different between Mysql Collation "Utf-8-general-ci" & Utf-8-unicode-ci ? regards, koko

Read the article
Can str_replace mangle a UTF-8 encoded string if it's only given valid UTF-8 encoded strings as argu

- by Manos Dilaverakis

PHP's str_replace() was intended only for ANSI strings and as such can mangle UTF-8 strings. However, given that it's binary-safe would it have a problem if it was only given valid UTF-8 strings as arguments? Edit: I'm not looking for a replacement function, I would just like to know if this hypothesis is correct.

Read the article
How to convert from HTML to UTF-8 in java

- by Llistes Sugra

Hi, I have an ASCII String, with HTML entities, like: à ¨ ç I need this String to be without those entities and convert them into UTF-8 chars. Is there any easy way, in java to do that? Where: Clazz.method("aà","UTF-8") returns "aà" or something like that?

Read the article
Outputing UTF-8 string on Mac OS's Terminal

- by SuperBloup

I got a programm in haskell outputting utf-8 using the package utf8-string and using only the output functions of this package. I set the encoding of each file I write to this way : hSetEncoding myFile utf8 {- myFile may be stdout -} but when I try to output : alpha = [fromEnum 0x03B1] {- a -} instead of the nice alpha letter I got on Linux (or in a file on windows), I got the following : Î± The weird thing is even if I try to write the output on a file, I can't read it back with mvim as an utf-8 file. Is there any way to get the correct behaviour

Read the article
\w in PHP preg_replace covers only second byte of UTF-8 chars

- by Andrey

we have this code: $value = preg_replace("/[^\w]/", '', $value); where $value is in utf-8. After this transformation first byte of multibyte characters is stripped. How to make \w cover UTF-8 chars completely? Sorry, i am not very well in PHP

Read the article
Change encoding to UTF-8 recursively on Windows?

- by Pekka

Does anybody know a tool, preferably for the Explorer context menu, to recursively change the encoding of files in a project from / to UTF-8 and other encodings? Freeware or not too expensive would be great. Edit: Thanks for the answers, +1 for all of them as they are all fine but I am a lazy bastard sometimes, and would really like to be able to just right click a folder and say "convert all .php files to UTF-8". :) Further suggestions are appreciated.

Read the article
remove utf-8 figure spaces with php

- by Jeroen Beerstra

I have some xml files with figure spaces in it, I need to remove those with php. The utf-8 code for these is e2 80 a9. If I'm not mistaken php does not seem to like 6 byte utf-8 chars, so far at least I'm unable to find a way to delete the figure spaces with functions like preg_replace. Anybody any tips or even better a solution to this problem?

Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >