Search Results

Search found 4604 results on 185 pages for 'utf'.

Page 8/185 | < Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15  | Next Page >

  • JMeter CSV Data Set is corrupting Japanese strings stored as proper UTF-8, I get Question Marks instead

    - by Mark Bennett
    I read in search terms from a simple text file to send to a search engine. It works fine in English, but gives me ???? for any Japanese text. Text with mixed English and Japanese does show the English text, so I know it's reading it. What I'm seeing: Input text: Snow Leopard ??????????????? Turns into: Snow Leopard ??????????????? This is in my POST field of an HTTP. If I set JMeter to encode the data, it just puts in the percent sequence for question marks. Interesting note: In the example above there are 15 Japanese characters, and then 15 question marks, so at some point it's being seen as full characters and not just bytes. About the Data: The CSV file is very simple in structure. There's only one field / one column, which I name TERM, and later use as ${TERM} I don't really need full CSV because it's only one string per line. There's no commas or quotes. When I run the Unix "file" command on the file, it says UTF-8 text. I've also verified it in command line and graphical mode on two machines. JMeter CSV Dataset Config: Filename: japanese-searches.csv File encoding: UTF-8 (also tried without) Variable names: TERM Delimiter: , Allow Quoted Data: False (I also tried True, different, but still wrong) Recycle at EOF: True Stop at EOF: False Staring mode: All threads A few things I've tried: Tried Allow quoted Data. It changed to other strange characters. -Dfile.encoding=UTF-8 Tried encoding the POST, but it just turned into a bunch of %nn for question marks And I'm not sure how "debug" just after the each line of the CSV is read in. I think it's corrupted right away, but I'm not sure. If it's only mangled when I reference it, then instead of ${TERM} perhaps there's some other "to bytes" function call. I'll start checking into that. I haven't done anything with the JMeter functions yet.

    Read the article

  • Unable to convert file to UTF-8

    - by antoniocs
    I am on windows xp sp3 and I am trying to convert a file from ASCII to UTF-8. I use notepad++ to do this. I go to Encoding Convert to UTF-8 without BOM. I save the file, reopen and it is still on ASCII. I am using this file in a webpage and I need the file to be UTF-8, because I have strings in utf-8 and they am seeing little squares with ? on them.

    Read the article

  • =?UTF-8?B??= in Emails sent via php mail problem

    - by Camran
    I have a website, and in the "Contact" section I have a form which users may fill in to contact me. The form is a simple form which action is a php page. The php code: $to = "[email protected]"; $name=$_POST['name']; // sender name $email=$_POST['email']; // sender email $tel= $_POST['tel']; // sender tel $subject=$_POST['subject']; // subject CHOSEN FROM DROPLIST, ALL TESTED $text=$_POST['text']; // Message from sender $text.="\n\nTel:".$tel; // Added to message to show me the telephone nr to the sender at bottom of message $headers="MIME-Version: 1.0"."\n"; $headers.="Content-type: text/plain; charset=UTF-8"."\n"; $headers.="From: $name <$email>"."\n"; mail($to, '=?UTF-8?B?'.base64_encode($subject).'?=', $text, $headers, '[email protected]'); Could somebody please tell me why this works most of the time, but sometimes I receive email whith no text and the subject line showing =?UTF-8?B??= I use outlook express, and I have read this http://stackoverflow.com/questions/454833/system-net-mail-and-utf-8bxxxxx-headers but it didn't help. The problem is not in Outlook, because when I log in to the actual mailprogram where I fetch the POP3 emails from, the email looks the same. When I right click in Outlook and chose "message source" then there is no "From" information. Ex, a good message should look like this: Subject: =?UTF-8?B?w5Z2cmlndA==?= MIME-Version: 1.0 Content-type: text/plain; charset=UTF-8 From: John Doe However, the ones with problem looks like this: Subject: =?UTF-8?B??= MIME-Version: 1.0 Content-type: text/plain; charset=UTF-8 From: As if the information has been lost somewhere. You should know also that I have a VPS, which I manage myself. I use postfix as an emailserver, if thats got anything to do with it. But then again, why does it work sometimes? Also another thing that I have noticed is that sometimes special characters are not shown correctly (by both Outlook and the webmail). For instance, the name "Björkman" in swedish is shown like Björkman, but again, only sometimes. I hope anybody knows something about this problem, because it is very hard to track down for me atleast. If you need more input let me know.

    Read the article

  • How relevant is UTF-7 when it comes to parsing emails?

    - by J. Pablo Fernández
    I recently implemented incoming emails for an application and boy, did I open the gates of hell? Since then every other day an email arrives that makes the app fail in a different way. One of those things is emails encoded as UTF-7. Most emails come as ASCII, some of the Latin encodings, or thankfully, UTF-8. Hotmail error messages (like email address doesn't exist or quota exceeded) seem to come as UTF-7. Unfortunately, UTF-7 is not an encoding Ruby understands: > "hello world".encode("utf-8", "utf-7") Encoding::ConverterNotFoundError: code converter not found (UTF-7 to UTF-8) > Encoding::UTF_7 => #<Encoding:UTF-7 (dummy)> My application doesn't crash, it actually handles the email quite well, but it does send me a notification about the potential error. I spent some time googling and I can't find anyone that implemented the conversion, at least not as a Ruby 1.9.3 Encoding::Converter. So, my question is, since I never got an email with actual content, from an actual person, in UTF-7, how relevant is that encoding? can I safely ignore it?

    Read the article

  • Get correct output from UTF-8 stored in VarChar using Entity Framework or Linq2SQL?

    - by jasonpenny
    Borland StarTeam seems to store its data as UTF-8 encoded data in VarChar fields. I have an ASP.NET MVC site that returns some custom HTML reports using the StarTeam database, and I would like to find a better solution for getting the correct data, for a rewrite with MVC2. I tried a few things with Encoding GetBytes and GetString, but I couldn't get it to work (we use mostly Delphi at work); then I figured out a T-SQL function to return a NVarChar from UTF-8 stored in a VarChar, and created new SQL views which return the data as NVarChar, but it's slow. The actual problem appears like this: “description†instead of “description”, in both SSMS and in a webpage when using Linq2SQL Is there a way to get the proper data out of these fields using Entity Framework or Linq2SQL?

    Read the article

  • Is there a list of language only character regions for UTF-8 somewhere?

    - by Brehtt
    I'm trying to analyze some UTF-8 encoded documents in a way that recognizes different language characters. For my approach to work I need to ignore non-language characters, such as control characters, mathematical symbols etc. Just trying to dissect the basic Latin section of the UTF standard has resulted in multiple regions, with characters like the division symbol being right in the middle of a range of valid Latin characters. Is there a list somewhere that identifies these regions? Or better yet, a Regex that defines the regions or something in C# that can identify the different characters?

    Read the article

  • Should I convert overlong UTF-8 strings to their shortest normal form?

    - by Grant McLean
    I've just been reworking my Encoding::FixLatin Perl module to handle overlong UTF-8 byte sequences and convert them to the shortest normal form. My question is quite simply "is this a bad idea"? A number of sources (including this RFC) suggest that any over-long UTF-8 should be treated as an error and rejected. They caution against "naive implementations" and leave me with the impression that these things are inherently unsafe. Since the whole purpose of my module is to clean up messy data files with mixed encodings and convert them to nice clean utf8, this seems like just one more thing I can clean up so the application layer doesn't have to deal with it. My code does not concern itself with any semantic meaning the resulting characters might have, it simply converts them into a normalised form. Am I missing something. Is there a hidden danger I haven't considered?

    Read the article

  • How can I check if a binary string is UTF-8 in mysql?

    - by Piotr Czapla
    I've found a Perl regexp that can check if a string is UTF-8 (the regexp is from w3c site). $field =~ m/\A( [\x09\x0A\x0D\x20-\x7E] # ASCII | [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte | \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte | \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates | \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3 | [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 | \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16 )*\z/x; But I'm not sure how to port it to MySQL as it seems that MySQL don't support hex representation of characters see this question. Any thoughts how to port the regexp to MySQL? Or maybe you know any other way to check if the string is valid UTF-8? UPDATE: I need this check working on the MySQL as I need to run it on the server to correct broken tables. I can't pass the data through a script as the database is around 1TB.

    Read the article

  • Perl to output processed XML file encoded as UTF-8 with UNIX line endings (in Win32 environment)?

    - by Umber Ferrule
    Running ActiveState Perl 5.8.8 on WinXP. As the title suggests, I'd like to output an XML file as UTF-8 with UNIX line endings. I've looked at the PerlDoc for binmode, but am unsure of the exact syntax (if I'm not barking up the wrong tree). The following doesn't do it (forgive my Perl - it's a learning process!): sub SaveFile { my($FileName, $Contents) = @_; my $File = "SAVE"; unless( open($File, ">:utf-8 :unix", $FileName) ) { die("Cannot open $FileName"); } print $File @$Contents; close($File); } Any suggestions? Thanks.

    Read the article

  • Why is conversion from UTF-8 to ISO-8859-1 not the same in Windows and Linux?

    - by user1895307
    I have the following in code to convert from UTF-8 to ISO-8859-1 in a jar file and when I execute this jar in Windows I get one result and in CentOS I get another. Might anyone know why? public static void main(String[] args) { try { String x = "Ä, ä, É, é, Ö, ö, Ãœ, ü, ß, «, »"; Charset utf8charset = Charset.forName("UTF-8"); Charset iso88591charset = Charset.forName("ISO-8859-1"); ByteBuffer inputBuffer = ByteBuffer.wrap(x.getBytes()); CharBuffer data = utf8charset.decode(inputBuffer); ByteBuffer outputBuffer = iso88591charset.encode(data); byte[] outputData = outputBuffer.array(); String z = new String(outputData); System.out.println(z); } catch(Exception e) { System.out.println(e.getMessage()); } } In Windows, java -jar test.jar test.txt creates a file containing: Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, » but in CentOS I get: ??, ä, ??, é, ??, ö, ??, ü, ??, «, » Help please!

    Read the article

  • How would you create a string of all UTF-8 characters? [PHP]

    - by Xeoncross
    There are many ways to represent the +1 million UTF-8 characters. Take the latin capital "A" with macron (A). This is unicode code point U+0100, hex number 0xc4 0x80, decimal number 196 128, and binary 11000100 10000000. I would like to create a collection of the first 65,535 UTF-8 characters for use in testing applications. These are all unicode characters up to code point U+FFFF (byte3). Is it possible to do something like a for($x=0) loop and then convert the resulting decimal to another base (like hex) which would allow the creation of the matching unicode character? I can create the value A using something like this: $char = "\xc4\x80"; // or $char = chr(196).chr(128); However, I am not sure how to turn this into an automated process. // fail! $char = "\x". dechex($a). "\x". dexhex($$b);

    Read the article

  • Should I convert overly-long UTF-8 strings to their shortest normal form?

    - by Grant McLean
    I've just been reworking my Encoding::FixLatin Perl module to handle overly-long UTF-8 byte sequences and convert them to the shortest normal form. My question is quite simply "is this a bad idea"? A number of sources (including this RFC) suggest that any over-long UTF-8 should be treated as an error and rejected. They caution against "naive implementations" and leave me with the impression that these things are inherently unsafe. Since the whole purpose of my module is to clean up messy data files with mixed encodings and convert them to nice clean utf8, this seems like just one more thing I can clean up so the application layer doesn't have to deal with it. My code does not concern itself with any semantic meaning the resulting characters might have, it simply converts them into a normalised form. Am I missing something. Is there a hidden danger I haven't considered?

    Read the article

  • UTF-8 bit representation

    - by Yanick Rochon
    I'm learning about UTF-8 standards and this is what I'm learning : Definition and bytes used UTF-8 binary representation Meaning 0xxxxxxx 1 byte for 1 à 7 bits chars 110xxxxx 10xxxxxx 2 bytes for 8 à 11 bits chars 1110xxxx 10xxxxxx 10xxxxxx 3 bytes for 12 à 16 bits chars 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 4 bytes for 17 à 21 bits chars And I'm wondering, why 2 bytes UTF-8 code is not 10xxxxxx instead, thus gaining 1 bit all the way up to 22 bits with a 4 bytes UTF-8 code? The way it is right now, 64 possible values are lost (from 1000000 to 10111111). I'm not trying to argue the standards, but I'm wondering why this is so? ** EDIT ** Even, why isn't it UTF-8 binary representation Meaning 0xxxxxxx 1 byte for 1 à 7 bits chars 110xxxxx xxxxxxxx 2 bytes for 8 à 13 bits chars 1110xxxx xxxxxxxx xxxxxxxx 3 bytes for 14 à 20 bits chars 11110xxx xxxxxxxx xxxxxxxx xxxxxxxx 4 bytes for 21 à 27 bits chars ...? Thanks!

    Read the article

  • UTF-8 locale portability (and ssh)

    - by kine
    I spend a lot of my time sshed into various machines, all of which are different (some are embedded, some run Linux, some run BSD, &c.). On my own local machines, however, i use OS X, which of course has a userland based on FreeBSD. My locale on those machines is set to en_GB.UTF-8, which is one of the available options: % echo `sw_vers` ProductName: Mac OS X ProductVersion: 10.8.2 BuildVersion: 12C60 % locale -a | grep -i 'en_gb.utf' en_GB.UTF-8 Several of the more-capable Linux systems i use appear to have an equivalent option, but i note that on Linux the name is slightly different: % lsb_release -d Description: Debian GNU/Linux 6.0.3 (squeeze) % locale -a | grep -i 'en_gb.utf' en_GB.utf8 This makes me wonder: When i ssh into a Linux machine from my Mac, and it forwards all of my LC_* variables with that 'UTF-8' suffix, does that Linux machine even understand what is being asked of it? Or is it just falling back to some other locale? In either case, what is the mechanism behind its behaviour, and is it dependent on any particular set-up (e.g., will i see the same behaviour on a BusyBox-based system as on a GNU-based one)?

    Read the article

  • Strange display language in gnome shell

    - by khalafuf
    I logged in gnome-shell, and found that the display language is set to some strange asian language (I think) without my prompt. I tried to change the locale settings but found that the default language is English (how?) despite of that strange language. Here's a snapshot, See the strange word instead of "Activity": I'm on Ubuntu 12.04 LTS. Output of locale: LANG=zh_CN.UTF-8 LANGUAGE=zh_CN:en_US:en LC_CTYPE="zh_CN.UTF-8" LC_NUMERIC=en_US.UTF-8 LC_TIME=en_US.UTF-8 LC_COLLATE="zh_CN.UTF-8" LC_MONETARY=en_US.UTF-8 LC_MESSAGES="zh_CN.UTF-8" LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 LC_ALL= Output of locale -a: C C.UTF-8 de_CH.utf8 en_AG en_AG.utf8 en_AU.utf8 en_BW.utf8 en_CA.utf8 en_DK.utf8 en_GB.utf8 en_IE.utf8 en_IN en_IN.utf8 en_NG en_NG.utf8 en_NZ.utf8 en_PH.utf8 en_SG.utf8 en_US.utf8 en_ZA.utf8 en_ZM en_ZM.utf8 en_ZW.utf8 POSIX zh_CN.utf8 zh_SG.utf8 Solved: This answer did it.

    Read the article

  • locale: Reset lost settings

    - by Adam Matan
    Due to some strange reason, I've lost some of my locale settings. I've managed to restore most of them using sudo dpkg-reconfigure locales: perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory So I'm stuck with one missing value: $ locale locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= Any idea how to restore them all? Thanks, Adam

    Read the article

  • Ubuntu 13.10 change first weekday to Monday in calendar applet

    - by wonderingapple
    Before the update (I was using 13.04), editing: sudo gedit /etc/default/locale so that LC_TIME="en_GB.UTF-8" does the job. However in 13.10, this does not work anymore. I've tried editing: sudo gedit /usr/share/i18n/locales/en_AU sudo gedit /usr/share/i18n/locales/en_GB sudo gedit /usr/share/i18n/locales/en_US so that first_weekday 2 in each of the files, but this also does not work. As a reference, when I run locale, the output is LANG=en_AU.UTF-8 LANGUAGE=en_AU:en LC_CTYPE="en_AU.UTF-8" LC_NUMERIC=en_AU.UTF-8 LC_TIME=en_AU.UTF-8 LC_COLLATE="en_AU.UTF-8" LC_MONETARY=en_AU.UTF-8 LC_MESSAGES="en_AU.UTF-8" LC_PAPER=en_AU.UTF-8 LC_NAME=en_AU.UTF-8 LC_ADDRESS=en_AU.UTF-8 LC_TELEPHONE=en_AU.UTF-8 LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=en_AU.UTF-8 LC_ALL= Please help.

    Read the article

  • How do I configure encodings (UTF-8) for code executed by Quartz scheduled Jobs in Spring framework

    - by Martin
    I wonder how to configure Quartz scheduled job threads to reflect proper encoding. Code which otherwise executes fine within Springframework injection loaded webapps (java) will get encoding issues when run in threads scheduled by quartz. Is there anyone who can help me out? All source is compiled using maven2 with source and file encodings configured as UTF-8. In the quartz threads any string will have encoding errors if outside ISO 8859-1 characters: Example config <bean name="jobDetail" class="org.springframework.scheduling.quartz.JobDetailBean"> <property name="jobClass" value="example.ExampleJob" /> </bean> <bean id="jobTrigger" class="org.springframework.scheduling.quartz.SimpleTriggerBean"> <property name="jobDetail" ref="jobDetail" /> <property name="startDelay" value="1000" /> <property name="repeatCount" value="0" /> <property name="repeatInterval" value="1" /> </bean> <bean class="org.springframework.scheduling.quartz.SchedulerFactoryBean"> <property name="triggers"> <list> <ref bean="jobTrigger"/> </list> </property> </bean> Example implementation public class ExampleJob extends QuartzJobBean { private Log log = LogFactory.getLog(ExampleJob.class); protected void executeInternal(JobExecutionContext ctx) throws JobExecutionException { log.info("ÅÄÖ"); log.info(Charset.defaultCharset()); } } Example output 2010-05-20 17:04:38,285 1342 INFO [QuartzScheduler_Worker-9] ExampleJob - vÖvÑvñ 2010-05-20 17:04:38,286 1343 INFO [QuartzScheduler_Worker-9] ExampleJob - UTF-8 The same lines of code executed within spring injected beans referenced by servlets in the web-container will output proper encoding. What is it that make Quartz threads encoding dependent?

    Read the article

  • How do I convert Windows 7 file-name encoding to UTF-8 for Ruby on Rails?

    - by Reilly
    Hi (Ive looked at the other questions - none seemed to quite fit my problem.) I have some file-names under Windows 7 that need to be translated in to MySQL database (UTF-8) with Ruby on Rails. An example file-name includes "íéó" in some kind of Windows 7 file-system encoding. Ive tried many combinations of gsub and ActiveSupport::Multibyte::Chars. Thanks for the help

    Read the article

  • How to get UTF-8 working in java webapps?

    - by kosoant
    I need to get UTF-8 working in my Java webapp (servlets + JSP, no framework used) to support äöå etc. for regular Finnish text and Cyrillic alphabets like ??? for special cases. My setup is the following: Development encironment: Windows XP Production encironment: Debian Database used: MySQL 5.x Users mainly use Firefox2 but also Opera 9.x, FF3, IE7 and Google Chrome are used to access the site. How to achieve this?

    Read the article

< Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15  | Next Page >