characters - Page 78 - Developer IT

C#, string replace russian to english

- by Fabio Beoni

Hello, I have a strange problem replacing chars in string... I read a .txt file containing russian text, and starting from a list of letters russian to english (ru=en), I loop the list and I WOULD like to replace russian characters with english characters. The problem is: I can see in the debug the right reading of the russian and the right reading of the english, but using myWord = myWord.Replace(ruChar, enChar), the string is not replaced. My txt file is a UTF-8 encoding. Any suggestions?? Thank you to all...

Read the article

Unescape _xHHHH_ XML escape sequences using Python

- by John Machin

I'm using Python 2.x [not negotiable] to read XML documents [created by others] that allow the content of many elements to contain characters that are not valid XML characters by escaping them using the _xHHHH_ convention e.g. ASCII BEL aka U+0007 is represented by the 7-character sequence u"_x0007_". Neither the functionality that allows representation of any old character in the document nor the manner of escaping is negotiable. I'm parsing the documents using cElementTree or lxml [semi-negotiable]. Here is my best attempt at unescapeing the parser output as efficiently as possible: import re def unescape(s, subber=re.compile(r'_x[0-9A-Fa-f]{4,4}_').sub, repl=lambda mobj: unichr(int(mobj.group(0)[2:6], 16)), ): if "_" in s: return subber(repl, s) return s The above is biassed by observing a very low frequency of "_" in typical text and a better-than-doubling of speed by avoiding the regex apparatus where possible. The question: Any better ideas out there?

Read the article

Purpose of Trigraph sequences in C++?

- by Kirill V. Lyadvinsky

According to C++'03 Standard 2.3/1: Before any other processing takes place, each occurrence of one of the following sequences of three characters (“trigraph sequences”) is replaced by the single character indicated in Table 1. ---------------------------------------------------------------------------- | trigraph | replacement | trigraph | replacement | trigraph | replacement | ---------------------------------------------------------------------------- | ??= | # | ??( | [ | ??< | { | | ??/ | \ | ??) | ] | ??> | } | | ??’ | ˆ | ??! | | | ??- | ˜ | ---------------------------------------------------------------------------- In real life that means that code printf( "What??!\n" ); will result in printing What| because ??! is a trigraph sequence that is replaced with the | character. My question is what purpose of using trigraphs? Is there any practical advantage of using trigraphs? UPD: In answers was mentioned that some European keyboards don't have all the punctuation characters, so non-US programmers have to use trigraphs in everyday life? UPD2: Visual Studio 2010 has trigraph support turned off by default.

Read the article

How to specify character encoding for Ant Task parameters in Java

- by räph

I'm writing an ANT task in Java. In my build.xml I specify parameters, which should be read from my java class. Problems occur, when I use special characters, like german umlauts (Ö,Ä,Ü) in these parameters. In my java task they appear as ?-characters (using System.out.print). All my files are encoded as UTF-8. and my build.xml has the corresponding declaration: <?xml version="1.0" encoding="UTF-8" ?> For the details of writing the task: I do it according to http://ant.apache.org/manual/develop.html (especially Point 5 nested elements). I have nested elements in my task like: <parameter name="test" value="ÖÄÜtest"/> and a java method: public void addConfiguredParameter(Parameter prop) { System.out.println(prop.getValue()); //prints ???test } to read the parameter values.

Read the article

Is it possible to reliably auto-decode user files to Unicode? [C#]

- by NVRAM

I have a web application that allows users to upload their content for processing. The processing engine expects UTF8 (and I'm composing XML from multiple users' files), so I need to ensure that I can properly decode the uploaded files. Since I'd be surprised if any of my users knew their files even were encoded, I have very little hope they'd be able to correctly specify the encoding (decoder) to use. And so, my application is left with task of detecting before decoding. This seems like such a universal problem, I'm surprised not to find either a framework capability or general recipe for the solution. Can it be I'm not searching with meaningful search terms? I've implemented BOM-aware detection (http://en.wikipedia.org/wiki/Byte_order_mark) but I'm not sure how often files will be uploaded w/o a BOM to indicate encoding, and this isn't useful for most non-UTF files. My questions boil down to: Is BOM-aware detection sufficient for the vast majority of files? In the case where BOM-detection fails, is it possible to try different decoders and determine if they are "valid"? (My attempts indicate the answer is "no.") Under what circumstances will a "valid" file fail with the C# encoder/decoder framework? Is there a repository anywhere that has a multitude of files with various encodings to use for testing? While I'm specifically asking about C#/.NET, I'd like to know the answer for Java, Python and other languages for the next time I have to do this. So far I've found: A "valid" UTF-16 file with Ctrl-S characters has caused encoding to UTF-8 to throw an exception (Illegal character?) (That was an XML encoding exception.) Decoding a valid UTF-16 file with UTF-8 succeeds but gives text with null characters. Huh? Currently, I only expect UTF-8, UTF-16 and probably ISO-8859-1 files, but I want the solution to be extensible if possible. My existing set of input files isn't nearly broad enough to uncover all the problems that will occur with live files. Although the files I'm trying to decode are "text" I think they are often created w/methods that leave garbage characters in the files. Hence "valid" files may not be "pure". Oh joy. Thanks.

Read the article

PHP: Replace umlauts with closest 7-bit ASCII equivalent in an UTF-8 string

- by BlaM

What I want to do is to remove all accents and umlauts from a string, turning "lärm" into "larm" or "andré" into "andre". What I tried to do was to utf8_decode the string and then use strtr on it, but since my source file is saved as UTF-8 file, I can't enter the ISO-8859-15 characters for all umlauts - the editor inserts the UTF-8 characters. Obviously a solution for this would be to have an include that's an ISO-8859-15 file, but there must be a better way than to have another required include? echo strtr(utf8_decode($input), 'ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ', 'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy'); UPDATE: Maybe I was a bit inaccurate with what I try to do: I do not actually want to remove the umlauts, but to replace them with their closest "one character ASCII" aequivalent.

Read the article

array data compression that is holding 13268 bits(1.66kBytes)

- by Mujtaba Haider

i.e array is having 100*125 bits of data for each aircraft+8 ascii messages each of 12 characters what compression technique should i apply to such data

Read the article

tchar safe functions -- count parameter for UTF-8 constants

- by Dustin Getz

I'm porting a library from char to TCHAR. the count parameter of this fragment, according to MSDN, is the number of multibyte characters, not the number of bytes. so, did I get this right? _tcsncmp(access, TEXT("ftp"), 3); //or do i want _tcsnccmp? "Supported on Windows platforms only, _mbsncmp and _mbsnbcmp are multibyte versions of strncmp. _mbsncmp will compare at most count multibyte characters and _mbsnbcmp will compare at most count bytes. They both use the current multibyte code page. _tcsnccmp and _tcsncmp are the corresponding Generic functions for _mbsncmp and _mbsnbcmp, respectively. _tccmp is equivalent to _tcsnccmp."

Read the article

Python code formatting

- by Curious2learn

In response to another question of mine, someone suggested that I avoid long lines in the code and to use PEP-8 rules when writing Python code. One of the PEP-8 rules suggested avoiding lines which are longer than 80 characters. I changed a lot of my code to comply with this requirement without any problems. However, changing the following line in the manner shown below breaks the code. Any ideas why? Does it have to do with the fact that what follows return command has to be in a single line? The line longer that 80 characters: def __str__(self): return "Car Type \n"+"mpg: %.1f \n" % self.mpg + "hp: %.2f \n" %(self.hp) + "pc: %i \n" %self.pc + "unit cost: $%.2f \n" %(self.cost) + "price: $%.2f "%(self.price) The line changed by using Enter key and Spaces as necessary: def __str__(self): return "Car Type \n"+"mpg: %.1f \n" % self.mpg + "hp: %.2f \n" %(self.hp) + "pc: %i \n" %self.pc + "unit cost: $%.2f \n" %(self.cost) + "price: $%.2f "%(self.price)

Read the article

Splitting a string in ASP Classic

- by Sam

So here's my string: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam elit lacus, dignissim quis laoreet non, cursus id eros. Etiam lacinia tortor vel purus eleifend accumsan. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Quisque bibendum vestibulum nisl vitae volutpat. I need to split it every 100 characters (full words only) until all the characters are used up. So we'd end up with: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam elit lacus, dignissim quis laoreet non, and cursus id eros. Etiam lacinia tortor vel purus eleifend accumsan. Pellentesque habitant morbi tristique and senectus et netus et malesuada fames ac turpis egestas. Quisque bibendum vestibulum nisl vitae volutpat. Any ideas on the best way to do that?

Read the article

UTF-8 to ISO-8859-1 mapping / lossless conversion libraries in Java

- by Pawel Krupinski

I need to perform a conversion of characters from UTF-8 to ISO-8859-1 in Java without losing for example all of the UTF-8 specific punctuation. Ideally would like these to be converted to equivalents in ISO (e.g. there are probably 5 different single quotes in UTF-8 and would like them all converted to ISO single quote character). String.getBytes("ISO-8859-1") just won't do the trick in this case as it will lose the UTF-8-specific chars. Do you know of any ready mappings or libraries in Java that would map UTF-8 specific characters to ISO?

Read the article

Convert Decimal to ASCII

- by Dan Snyder

I'm having difficulty using reinterpret_cast. Before I show you my code I'll let you know what I'm trying to do. I'm trying to get a filename from a vector full of data being used by a MIPS I processor I designed. Basically what I do is compile a binary from a test program for my processor, dump all the hex's from the binary into a vector in my c++ program, convert all of those hex's to decimal integers and store them in a DataMemory vector which is the data memory unit for my processor. I also have instruction memory. So When my processor runs a SYSCALL instruction such as "Open File" my C++ operating system emulator receives a pointer to the beginning of the filename in my data memory. So keep in mind that data memory is full of ints, strings, globals, locals, all sorts of stuff. When I'm told where the filename starts I do the following: Convert the whole decimal integer element that is being pointed to to its ASCII character representation, and then search from left to right to see if the string terminates, if not then just load each character consecutively into a "filename" string. Do this until termination of the string in memory and then store filename in a table. My difficulty is generating filename from my memory. Here is an example of what I'm trying to do: C++ Syntax (Toggle Plain Text) 1.Index Vector NewVector ASCII filename 2.0 240faef0 128123792 'abc7' 'a' 3.0 240faef0 128123792 'abc7' 'ab' 4.0 240faef0 128123792 'abc7' 'abc' 5.0 240faef0 128123792 'abc7' 'abc7' 6.1 1234567a 243225 'k2s0' 'abc7k' 7.1 1234567a 243225 'k2s0' 'abc7k2' 8.1 1234567a 243225 'k2s0' 'abc7k2s' 9. //EXIT LOOP// 10.1 1234567a 243225 'k2s0' 'abc7k2s' Index Vector NewVector ASCII filename 0 240faef0 128123792 'abc7' 'a' 0 240faef0 128123792 'abc7' 'ab' 0 240faef0 128123792 'abc7' 'abc' 0 240faef0 128123792 'abc7' 'abc7' 1 1234567a 243225 'k2s0' 'abc7k' 1 1234567a 243225 'k2s0' 'abc7k2' 1 1234567a 243225 'k2s0' 'abc7k2s' //EXIT LOOP// 1 1234567a 243225 'k2s0' 'abc7k2s' Here is the code that I've written so far to get filename (I'm just applying this to element 1000 of my DataMemory vector to test functionality. 1000 is arbitrary.): C++ Syntax (Toggle Plain Text) 1.int i = 0; 2.int step = 1000;//top->a0; 3.string filename; 4.char *temp = reinterpret_cast<char*>( DataMemory[1000] );//convert to char 5.cout << "a0:" << top->a0 << endl;//pointer supplied 6.cout << "Data:" << DataMemory[top->a0] << endl;//my vector at pointed to location 7.cout << "Data(1000):" << DataMemory[1000] << endl;//the element I'm testing 8.cout << "Characters:" << &temp << endl;//my temporary char array 9. 10.while(&temp[i]!=0) 11.{ 12. filename+=temp[i];//add most recent non-terminated character to string 13. i++; 14. if(i==4)//when 4 chatacters have been added.. 15. { 16. i=0; 17. step+=1;//restart loop at the next element in DataMemory 18. temp = reinterpret_cast<char*>( DataMemory[step] ); 19. } 20. } 21. cout << "Filename:" << filename << endl; int i = 0; int step = 1000;//top-a0; string filename; char *temp = reinterpret_cast( DataMemory[1000] );//convert to char cout << "a0:" << top-a0 << endl;//pointer supplied cout << "Data:" << DataMemory[top-a0] << endl;//my vector at pointed to location cout << "Data(1000):" << DataMemory[1000] << endl;//the element I'm testing cout << "Characters:" << &temp << endl;//my temporary char array while(&temp[i]!=0) { filename+=temp[i];//add most recent non-terminated character to string i++; if(i==3)//when 4 chatacters have been added.. { i=0; step+=1;//restart loop at the next element in DataMemory temp = reinterpret_cast( DataMemory[step] ); } } cout << "Filename:" << filename << endl; So the issue is that when I do the conversion of my decimal element to a char array I assume that 8 hex #'s will give me 4 characters. Why isn't this this case? Here is my output: C++ Syntax (Toggle Plain Text) 1.a0:0 2.Data:0 3.Data(1000):4428576 4.Characters:0x7fff5fbff128 5.Segmentation fault

Read the article

This regx does not work only in Chrome

- by Deeptechtons

Hi i just put up a validation function in jScript to validate filename in fileupload control[input type file]. The function seems to work fine in FF and sometimes in ie but never in Chrome. Basically the function tests if File name is atleast 1 char upto 25 characters long.Contains only valid characters,numbers [no spaces] and are of file types in the list. Could you throw some light on this function validate(Uploadelem) { var objRgx = new RegExp(/^[\w]{1,25}\.*\.(jpg|gif|png|jpeg|doc|docx|pdf|txt|rtf)$/); objRgx.ignoreCase = true; if (objRgx.test(Uploadelem.value)) { document.getElementById('moreUploadsLink').style.display = 'block'; } else { document.getElementById('moreUploadsLink').style.display = 'none'; } }

Read the article

comparing strings in PostgreSQL

- by binaryLV

Hello! Is there any way in PostgreSQL to convert UTF-8 characters to "similar" ASCII characters? String glažškunu rukiši would have to be converted to glazskunu rukisi. UTF-8 text is not in some specific language, it might be in Latvian, Russian, English, Italian or any other language. This is needed for using in where clause, so it might be just "comparing strings" rather than "converting strings". I tried using convert, but it does not give desired results (e.g., select convert('A', 'utf8', 'sql_ascii') gives \304\200, not A). Database is created with: ENCODING = 'UTF8' LC_COLLATE = 'Latvian_Latvia.1257' LC_CTYPE = 'Latvian_Latvia.1257' These params may be changed, if necessary.

Read the article

ASP.NET plug-in architecture, settings problem

- by Xaqron

I want to divide business layer (BLL) of an asp.net application into multiple components. Each component is a .NET class library which is compiled as a standalone DLL. These components should have their own configuration files. For example "MyNameSpace.Users.dll" contains classes about users of the website and there's a password policy to check if password length is at least x characters. When webmaster edits the config file of this DLL and set x to y then component (DLL) should use new value (y) in the future and enforce passwords to be at least y characters. I want each component as a single project and compile them separaely (and not to put all projects in a solution in Visual Studio), and put the DLLs of these libraries into the "Bin" folder of my ASP.NET application. Is it possible ? Where should I put these config files ?

Read the article

Why aren't double quotes and backslashes allowed in strings in the JSON standard?

- by Dan Herbert

If I run this in a JavaScript console in Chrome or Firebug, it works fine. JSON.parse('"\u0027"') // Escaped single-quote But if I run either of these 2 lines in a Javascript console, it throws an error. JSON.parse('"\u0022"') // Escaped double-quote JSON.parse('"\u005C"') // Escaped backslash RFC 4627 section 2.5 seems to imply that \ and " are allowed characters as long as they're properly escaped. The 2 browsers I've tried this in don't seem to allow it, however. Is there something I'm doing wrong here or are they really not allowed in strings? I've also tried using \" and \\ in place of \u0022 and \u005C respectively. I feel like I'm just doing something very wrong, because I find it hard to believe that JSON would not allow these characters in strings, especially since the specification doesn't seem to mention anything that I could find saying they're not allowed.

Read the article

How to get some xml that comes before and a little from after a DOM Node.

- by startoftext

I am using java and I am pretty open to using w3c DOM or DOM4J at this point. So lets say I have a Node like a text node that I have found something interesting in, like say an occurrence of a substring in the nodes text. If I want to get a string with a number characters preceding that node and a few characters after that node how may I do that? Basically I need to be able to display a snippet of the original xml around the occurrence of that string. The problem I have with getting the parent node for example and then calling asXML is that I no longer know the exact location of the substring in the text node. If I search again for that string value in the parents xml then I may find 2 occurrences or many more if the parent has other children that contain an occurrence of that string. Much appreciation if any one can answer this question.

Read the article

Effective methods for reading and writing large files in C

- by Bertholt Stutley Johnson

I'm writing an application that deals with very large user-generated input files. The program will copy about 95 percent of the file, effectively duplicating it and switching a few words and values in the copy, and then appending the copy (in chunks) to the original file, such that each block (consisting of between 10 and 50 lines) in the original is followed by the copied and modified block, and then the next original block, and so on. The user-generated input conforms to a certain format, and it is highly unlikely that any line in the original file is longer than 100 characters long. Which would be the better approach? a) To use one file pointer and use variables that hold the current position of how much has been read and where to write to, seeking the file pointer back and forth to read and write; or b) To use multiple file pointers, one for reading and one for writing. I am mostly concerned with the efficiency of the program, as the input files will reach up to 25,000 lines, each about 50 characters long. Thanks!

Read the article

Changing text depending on rounded total from database

- by NeonBlue Bliss

On a website I have a number of small PHP scripts to automate changes to the text of the site, depending on a figure that's calculated from a MySQL database. The site is for a fundraising group, and the text in question on the home page gives the total amount raised. The amount raised is pulled from the database and rounded to the nearest thousand. This is the PHP I use to round the figure and find the last three digits of the total: $query4 = mysql_query("SELECT SUM(amountraised) AS full_total FROM fundraisingtotal;"); $result4 = mysql_fetch_array($query4); $fulltotal = $result4["full_total"]; $num = $fulltotal + 30000; $ftotalr = round($num,-3); $roundnum = round($num); $string = $roundnum; $length = strlen($string); $characters = 3; $start = $length - $characters; $string = substr($string , $start ,$characters); $figure = $string; (£30,000 is the amount that had been raised by the previous fundraising team from when the project first started, which is why I've added 30000 to $fulltotal for the $num variable) Currently the text reads: the bookstall and other fundraising events have raised more than £<? echo number_format($ftotalr); ?> I've just realised though that because the PHP is rounding to the nearest thousand, if the total's for example £39,200 and it's rounded to £40,000, to say it's more than £40,000 is incorrect, and in that case I'd need it to say 'almost £40,000' or something similar. I obviously need to replace the 'more than' with a variable. Obviously I need to test whether the last three digits of the total are nearer to 0 or 1000, so that if the total was for example £39,2000, the text would read 'just over', if it was between £39,250 and £39,400 something like 'over', between £39,400 and £39,700 something like 'well over', and between £39,700 and £39,999, 'almost.' I've managed to get the last three digits of the total as a variable, and I think I need some sort of an if/else/elseif code block (not sure if that would be the right approach, or whether to use case/break), and obviously I'm going to have to check whether the figure meets each of the criteria, but I can't figure out how to do that. Could anyone suggest what would be the best way to do this please?

Read the article

Select Range by string

- by Petott

How I can change this feature so I select the range of characters in a word document between the characters "E" and "F", if I have; xasdasdEcdscasdcFvfvsdfv is underlined to me the range - cdscasdc private void Rango() { Word.Range rng; Word.Document document = this.Application.ActiveDocument; object startLocation = "E"; object endLocation = "F"; // Supply a Start and End value for the Range. rng = document.Range(ref startLocation, ref endLocation); // Select the Range. rng.Select(); } This function will not let me pass by reference two objects of string type....... Thanks

Read the article

A maximum character limit on the preg functions?

- by animuson

On my site I use output buffering to grab all the output and then run it through a process function before sending it out to the browser (I don't replace anything, just break it into more manageable pieces). In this particular case, there is a massive amount of output because it is listing out a label for every country in the database (around 240 countries). The problem is that in full, my preg_match functions seems to get skipped over, it does absolutely nothing and returns no matches. However, if I remove parts of the labels (no particular part, just random pieces to reduce characters) then the preg_match functions works again. It doesn't seem to matter what I remove from the label, it just seems to be that as long as I remove so many characters. Is there some sort of cap on what the preg functions can handle or will it time out if there is too much data to be scanned over?

Read the article

create program in c for permutation combination and showing frequency

- by Vishal Oswal

I have 2 strings where I have saved fixed 20 characters and these are “A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T” and same 20 char in string 2. so I will get 400 combinations of 2 character sets like AA,AB,AC,AD,AE,AF,……………AT BA,BB,BC,BD,BE,BF,…………..BT CA,CB,CC,CD,CE,CF……………CT This way we will get 400 combinations (Which program I have created successfully) but then user will put the value till 31 characters witch will be treated as 3rd string for E.g. “ABCDDAAAB” now I have to check the frequency of user input in the sequence of 12,23,34,45,56,67,78,89 (2 CHAR SET) means AB,BC,CD,DD,DA,AA,AA,AB and need to show the frequency of user input OUTPUT: AB=2 BC=1 CD=1 DD=1 DA=1 AA=2 please its urgent

Read the article

Encoding a string as an integer .NET

- by Paul Knopf

I have a string that I would like represented uniquely as an integer. For example: A3FJEI = 34950140 How would I go about writing a EncodeAsInteger(string) method. I understand that the amount of characters in the string will make the integer increase greatly, forcing the value to become a long, not an int. Since I need the value to be an integer, I don't need the numerical representation to be entirely unique to the string. Maybe I can foreach through all the characters of the string and sum the numerical keycode of the character.

Read the article

C# method contents regex validation

- by user258651

I need to validate the contents of a C# method. I do not care about syntax errors. I do care about characters that will invalidate parsing of the rest of the code. For example: method() { /* valid comment */ /* <-- bad for (i..) { } for (i..) { <-- bad } I need to validate/fix any non-paired characters. This includeds /* */, { }, and maybe others. How should I go about this?

Read the article

Perl LWP::UserAgent mishandling UTF-8 response

- by RedGrittyBrick

When I use LWP::UserAgent to retrieve content encoded in UTF-8 it seems LWP::UserAgent doesn't handle the encoding correctly. Here's the output after setting the Command Prompt window to Unicode by the command chcp 65001 Note that this initially gives the appearance that all is well, but I think it's just the shell reassembling bytes and decoding UTF-8, From the other output you can see that perl itself is not handling wide characters correctly. C:\perl getutf8.pl ====================================================================== HTTP/1.1 200 OK Connection: close Date: Fri, 31 Dec 2010 19:24:04 GMT Accept-Ranges: bytes Server: Apache/2.2.8 (Win32) PHP/5.2.6 Content-Length: 75 Content-Type: application/xml; charset=utf-8 Last-Modified: Fri, 31 Dec 2010 19:20:18 GMT Client-Date: Fri, 31 Dec 2010 19:24:04 GMT Client-Peer: 127.0.0.1:80 Client-Response-Num: 1 <?xml version="1.0" encoding="UTF-8"? <nameBudejovický Budvar</name ====================================================================== response content length is 33 ....v....1....v....2....v....3....v....4 <nameBudejovický Budvar</name . . . . v . . . . 1 . . . . v . . . . 2 . . . . v . . . . 3 . . . . 3c6e616d653e427564c49b6a6f7669636bc3bd204275647661723c2f6e616d653e < n a m e B u d ? ? j o v i c k ? ? B u d v a r < / n a m e Above you can see the payload length is 31 characters but Perl thinks it is 33. For confirmation, in the hex, we can see that the UTF-8 sequences c49b and c3bd are being interpreted as four separate characters and not as two Unicode characters. Here's the code #!perl use strict; use warnings; use LWP::UserAgent; my $ua = LWP::UserAgent-new(); my $response = $ua-get('http://localhost/Bud.xml'); if (! $response-is_success) { die $response-status_line; } print '='x70,"\n",$response-as_string(), '='x70,"\n"; my $r = $response-decoded_content((charset = 'UTF-8')); $/ = "\x0d\x0a"; # seems to be \x0a otherwise! chomp($r); # Remove any xml prologue $r =~ s/^<\?.*\?\x0d\x0a//; print "Response content length is ", length($r), "\n\n"; print "....v....1....v....2....v....3....v....4\n"; print $r,"\n"; print ". . . . v . . . . 1 . . . . v . . . . 2 . . . . v . . . . 3 . . . . \n"; print unpack("H*", $r), "\n"; print join(" ", split("", $r)), "\n"; Note that Bud.xml is UTF-8 encoded without a BOM. How can I persuade LWP::UserAgent to do the right thing? P.S. Ultimately I want to translate the Unicode data into an ASCII encoding, even if it means replacing each non-ASCII character with one question mark or other marker. I have accepted Ysth's "upgrade" answer - because I know it is the right thing to do when possible. However I am going to use a work-around (which may depress Tom further): $r = encode("cp437", decode("utf8", $r));

Search Results

Search found 5222 results on 209 pages for 'characters'.

Page 78/209 | < Previous Page | 74 75 76 77 78 79 80 81 82 83 84 85 | Next Page >

- by Fabio Beoni

- by John Machin

- by Kirill V. Lyadvinsky

- by räph

- by NVRAM

- by BlaM

- by Mujtaba Haider

- by Dustin Getz

- by Curious2learn

- by Sam

- by Pawel Krupinski

- by Dan Snyder

- by Deeptechtons

- by binaryLV

- by Xaqron

- by Dan Herbert

- by startoftext

- by Bertholt Stutley Johnson

- by NeonBlue Bliss

- by Petott

- by animuson

- by Vishal Oswal

- by Paul Knopf

- by user258651

- by RedGrittyBrick

< Previous Page | 74 75 76 77 78 79 80 81 82 83 84 85 | Next Page >