Search Results

Search found 4647 results on 186 pages for 'localizable strings'.

Page 5/186 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Index independent character comparison within text blocks

    - by Michael IV
    I have the following task: developing a program where there is a block of sample text which should be typed by user. Any typos the user does during the test are registered. Basically, I can compare each typed char with the sample char based on caret index position of the input, but there is one significant flaw in such a "naive" approach. If the user typed mistakenly more letters than a whole string has, or inserted more white spaces between the string than should be, then the rest of the comparisons will be wrong because of the index offsets added by the additional wrong insertions. I have thought of designing some kind of parser where each string (or even a char ) is tokenized and the comparisons are made "char-wise" and not "index-wise," but that seems to me like an overkill for such a task. I would like to get a reference to possibly existing algorithms which can be helpful in solving this kind of problem.

    Read the article

  • Shortest Common Superstring: find shortest string that contains all given string fragments

    - by occulus
    Given some string fragments, I would like to find the shortest possible single string ("output string") that contains all the fragments. Fragments can overlap each other in the output string. Example: For the string fragments: BCDA AGF ABC The following output string contains all fragments, and was made by naive appending: BCDAAGFABC However this output string is better (shorter), as it employs overlaps: ABCDAGF ^ ABC ^ BCDA ^ AGF I'm looking for algorithms for this problem. It's not absolutely important to find the strictly shortest output string, but the shorter the better. I'm looking for an algorithm better than the obvious naive one that would try appending all permutations of the input fragments and removing overlaps (which would appear to be NP-Complete). I've started work on a solution and it's proving quite interesting; I'd like to see what other people might come up with. I'll add my work-in-progress to this question in a while.

    Read the article

  • Best practice Java - String array constant and indexing it

    - by Pramod
    For string constants its usual to use a class with final String values. But whats the best practice for storing string array. I want to store different categories in a constant array and everytime a category has been selected, I want to know which category it belongs to and process based on that. Addition : To make it more clear, I have a categories A,B,C,D,E which is a constant array. Whenever a user clicks one of the items(button will have those texts) I should know which item was clicked and do processing on that. I can define an enum(say cat) and everytime do if clickedItem == cat.A .... else if clickedItem = cat.B .... else if .... or even register listeners for each item seperately. But I wanted to know the best practice for doing handling these kind of problems.

    Read the article

  • C Minishell Command Expansion Printing Gibberish

    - by Optimus_Pwn
    I'm writing a unix minishell in C, and am at the point where I'm adding command expansion. What I mean by this is that I can nest commands in other commands, for example: $> echo hello $(echo world! ... $(echo and stuff)) hello world! ... and stuff I think I have it working mostly, however it isn't marking the end of the expanded string correctly, for example if I do: $> echo a $(echo b $(echo c)) a b c $> echo d $(echo e) d e c See it prints the c, even though I didn't ask it to. Here is my code: msh.c - http://pastebin.com/sd6DZYwB expand.c - http://pastebin.com/uLqvFGPw I have a more code, but there's a lot of it, and these are the parts that I'm having trouble with at the moment. I'll try to tell you the basic way I'm doing this. Main is in msh.c, here it gets a line of input from either the commandline or a shellfile, and then calls processline (char *line, int outFD, int waitFlag), where line is the line we just got, outFD is the file descriptor of the output file, and waitFlag tells us whether or not we should wait if we fork. When we call this from main we do it like this: processline (buffer, 1, 1); In processline, we allocate a new line: char expanded_line[EXPANDEDLEN]; We then call expand, in expand.c: expand(line, expanded_line, EXPANDEDLEN); In expand, we copy the characters literally from line to expanded_line until we find a $(, which then calls: static int expCmdOutput(char *orig, char *new, int *oldl_ind, int *newl_ind) orig is line, and new is expanded line. oldl_ind and newl_ind are the current positions in the line and expanded line, respectively. Then we pipe, and recursively call processline, passing it the nested command(for example, if we had "echo a $(echo b)", we would pass processline "echo b"). This is where I get confused, each time expand is called, is it allocating a new chunk of memory EXPANDEDLEN long? If so, this is bad because I'll run out of stack room really quickly(in the case of a hugely nested commandline input). In expand I insert a null character at the end of the expanded string, so why is it printing past it? If you guys need any more code, or explanations, just ask. Secondly, I put the code in pastebin because there's a ton of it, and in my experience people don't like it when I fill up several pages with code. Thanks.

    Read the article

  • How does PhP internally represent string?

    - by Jim Thio
    UTF8? UTF16? Does the string in PhP also keep tracks the encoding used for that string? A good answer would give me a sample of Let's look at this script for example. Say I do $original = "??????????????"; What actually happen? Obviously I think $original will not contain just 7 characters. Those glyps must each be represented by several bytes there. Then I do $converted= mb_convert_encoding ($original , "UTF-8") What will happen to $converted? How will $converted be different than $original? Will it be just the exact same byte sequence with $original but with different encoding? Or what?

    Read the article

  • Regex to String generation

    - by Mifmif
    Let's say that we have a regex and an index i , if we suppose that the set of string that match our regex are sorted in a lexicographical order, how could we get the i element of this list ? Edit : I added this simple example for more explanation : input : regex="[a-c]{1,2}"; index=4; in this case the ordered list of string that matches this regex contains those elements : a aa ab ac b ba bb bc c ca cb cc output : the 4th element which is ac ps: it's known that the string list that match the regex have infinite element, this doesn't have an impact on the proccess of extracting the element in the given index.

    Read the article

  • String Formatting with concatenation or substitution

    - by Davio
    This is a question about preferences. Assume a programming language offers these two options to make a string with some variables: "Hello, my name is ". name ." and I'm ". age ." years old." StringFormat("Hello, my name is $0 and I'm $1 years old.", name, age) Which do you prefer and why? I have found myself using both without any clear reason to pick either. Considering micro-optimizations is not within the scope of this question. Localization has been mentioned as a reason to go with option #2 and I think it's a very valid reason and deserves to be mentioned here. However, would opinions differ based on aesthetic viewpoints?

    Read the article

  • How can I handle this string concatenation in C in a reusable way

    - by hyphen this
    I've been writing a small C application that operates on files, and I've found that I have been copy+pasting this code around my functions: char fullpath[PATH_MAX]; fullpath[0] = '\0'; strcat(fullpath, BASE_PATH); strcat(fullpath, subdir); strcat(fullpath, "/"); strcat(fullpath, filename); // do something with fullpath... Is there a better way? The first thought that comes to mind is to create a macro but I'm sure this is a common problem in C, and I'm wondering how others have solve it.

    Read the article

  • how to insert multiple characters into a string in C [closed]

    - by John Li
    I wish to insert some characters into a string in C: Example: char string[20] = "20120910T090000"; I want to make it something like "2012-09-10-T-0900-00" My code so far: void append(char subject[],char insert[], int pos) { char buf[100]; strncpy(buf, subject, pos); int len = strlen(buf); strcpy(buf+len, insert); len += strlen(insert); strcpy(buf+len, subject+pos); strcpy(subject, buf); } When I call this the first time I get: 2012-0910T090000 However when I call it a second time I get: 2012-0910T090000-10T090000 Any help is appreciated

    Read the article

  • How to implement string matching based on a pattern

    - by Vincent Rischmann
    I was asked to build a tool that can identify if a string match a pattern. Example: {1:20} stuff t(x) {a,b,c} would match: 1 stuff tx a 20 stuff t c It is a sort of regex but with a different syntax Parentheses indicate an optional value {1:20} is a interval; I will have to check if the token is a number and if it is between 1 and 20 {a,b,c} is just an enumeration; it can be either a or b or c Right now I implemented this with a regex, and the interval stuff was a pain to do. On my own time I tried implementing some kind of matcher by hand, but it turns out it's not that easy to do. By experimenting I ended up with a function that generates a state table from the pattern and a state machine. It worked well until I tried to implement the optional value, and I got stuck and how to generate the state table. After that I searched how I could do this, and that led me to stuff like LL parser, LALR parser, recursive-descent parser, context-free grammars, etc. I never studied any of this so it's hard to know what is relevant here, but I think this is what I need: A grammar A parser which generates states from the grammar and a pattern A state machine to see if a string match the states So my first question is: Is this right ? And second question, what do you recommend I read/study to be able to implement this ?

    Read the article

  • Test if char is '0'-'9' [migrated]

    - by Chris Okyen
    My Java code is // if the first character is not 0-9 if( (((input.substring(0,0)).compareTo("1")) < 0 || (((input.substring(0,0)).compareTo("9")) < 0)); { System.out.println("Error: The first digit of the temperature is not 1-9. Force Exiting Program. "); System.exit(1); // exit the program } But it slips through even with input like '1123' or '912' or '222', and gives this error message. Whether I have it 0 for first and < 0 for the second, or 0 and 0, or < 0 and < 0, or even < o and 0... How come?

    Read the article

  • How to fix legacy code that uses <string.h> unsafely?

    - by Snowbody
    We've got a bunch of legacy code, written in straight C (some of which is K&R!), which for many, many years has been compiled using Visual C 6.0 (circa 1998) on an XP machine. We realize this is unsustainable, and we're trying to move it to a modern compiler. Political issues have said that the most recent compiler allowed is VC++ 2005. When compiling the project, there are many warnings about the unsafe string manipulation functions used (sprintf(), strcpy(), etc). Reviewing some of these places shows that the code is indeed unsafe; it does not check for buffer overflows. The compiler warning recommends that we move to using sprintf_s(), strcpy_s(), etc. However, these are Microsoft-created (and proprietary) functions and aren't available on (say) gcc (although we're primarily a Windows shop we do have some clients on various flavors of *NIX) How ought we to proceed? I don't want to roll our own string libraries. I only want to go over the code once. I'd rather not switch to C++ if we can help it.

    Read the article

  • Common Substring of two strings

    - by Chander Shivdasani
    This particular interview-question stumped me: Given two Strings S1 and S2. Find the longest Substring which is a Prefix of S1 and suffix of S2. Through Google, I came across the following solution, but didnt quite understand what it was doing. public String findLongestSubstring(String s1, String s2) { List<Integer> occurs = new ArrayList<>(); for (int i = 0; i < s1.length(); i++) { if (s1.charAt(i) == s2.charAt(s2.length()-1)) { occurs.add(i); } } Collections.reverse(occurs); for(int index : occurs) { boolean equals = true; for(int i = index; i >= 0; i--) { if (s1.charAt(index-i) != s2.charAt(s2.length() - i - 1)) { equals = false; break; } } if(equals) { return s1.substring(0,index+1); } } return null; }

    Read the article

  • Bikeshedding: Placeholders in strings

    - by dotancohen
    I find that I sometimes use placeholders in strings, like this: $ cat example-apache <VirtualHost *:80> ServerName ##DOMAIN_NAME## ServerAlias www.##DOMAIN_NAME## DocumentRoot /var/www/##DOMAIN_NAME##/public_html </VirtualHost> Now I am sure that it is a minor issue if the placeholder is ##DOMAIN_NAME##, !!DOMAIN_NAME!!, {{DOMAIN_NAME}}, or some other variant. However, I now need to standardize with other developers on a project, and we all have a vested interest in having our own placeholder format made standard in the organization. Are there any good reasons for choosing any of these, or others? I am trying to quantify these considerations: Aesthetics and usability. For example, __dict__ may be hard to read as we don't know how many underscores are in there. Compatibility. Will some language try to do something funny with {} syntax in a string (such as PHP does with "Welcome to {$siteName} today!")? Actually, I know that PHP and Python won't, but others? Will a C++ preprocessor choke on ## format? If I need to store the value in some SQL engine, will it not consider something a comment? Any other pitfalls to be wary of? Maintainability. Will the new guy mistake ##SOME_PLACEHOLDER## as a language construct? The unknown. Surely the wise folk here will think of other aspects of this decision that I have not thought of. I might be bikeshedding this, but if there are real issues that might be lurking then I would certainly like to know about them before mandating that our developers adhere to a potentially-problematic convention.

    Read the article

  • Placeholders in strings

    - by dotancohen
    I find that I sometimes use placeholders in strings, like this: $ cat example-apache <VirtualHost *:80> ServerName ##DOMAIN_NAME## ServerAlias www.##DOMAIN_NAME## DocumentRoot /var/www/##DOMAIN_NAME##/public_html </VirtualHost> Now I am sure that it is a minor issue if the placeholder is ##DOMAIN_NAME##, !!DOMAIN_NAME!!, {{DOMAIN_NAME}}, or some other variant. However, I now need to standardize with other developers on a project, and we all have a vested interest in having our own placeholder format made standard in the organization. Are there any good reasons for choosing any of these, or others? I am trying to quantify these considerations: Aesthetics and usability. For example, __dict__ may be hard to read as we don't know how many underscores are in there. Compatibility. Will some language try to do something funny with {} syntax in a string (such as PHP does with "Welcome to {$siteName} today!")? Actually, I know that PHP and Python won't, but others? Will a C++ preprocessor choke on ## format? If I need to store the value in some SQL engine, will it not consider something a comment? Any other pitfalls to be wary of? Maintainability. Will the new guy mistake ##SOME_PLACEHOLDER## as a language construct? The unknown. Surely the wise folk here will think of other aspects of this decision that I have not thought of. I might be bikeshedding this, but if there are real issues that might be lurking then I would certainly like to know about them before mandating that our developers adhere to a potentially-problematic convention.

    Read the article

  • PL/SQL to delete invalid data from token Strings

    - by Jie Chen
    Previous article describes how to delete the duplicated values from token string in bulk mode. This one extends it and shows the way to delete invalid data. Scenario Support we have page_two and manufacturers tables in database and the table DDL is: SQL> desc page_two; Name NULL? TYPE ----------------------------------------- -------- ------------------------ MULTILIST04 VARCHAR2(765) SQL> SQL> desc manufacturers; Name NULL? TYPE ----------------------------------------- -------- ------ ID NOT NULL NUMBER NAME VARCHAR In table page_two, column multilist04 stores a token string splitted with common. Each token represent a valid ID in manufacturers table. My expectation is to delete invalid token strings from page_two.multilist04, which have no mapping id in manufacturers.id. For example in below SQL result: ,6295728,33,6295729,6295730,6295731,22, , value 33 and 22 are invalid data because there is no ID equals to 33 or 22 in manufacturers table. So I need to delete 33 and 22. SQL> col rowid format a20; SQL> col multilist04 format a50; SQL> select rowid, multilist04 from page_two; ROWID MULTILIST04 -------------------- -------------------------------------------------- AAB+UrADfAAAAhUAAI ,6295728,6295729,6295730,6295731, AAB+UrADfAAAAhUAAJ ,1111,6295728,6295729,6295730,6295731, AAB+UrADfAAAAhUAAK ,6295728,111,6295729,6295730,6295731, AAB+UrADfAAAAhUAAL ,6295728,6295729,6295730,6295731,22, AAB+UrADfAAAAhUAAM ,6295728,33,6295729,6295730,6295731,22, SQL> select id, encode_name from manufacturers where id in (1111,11,22,33); No rows selected SQL> Solution As there is no existing SPLIT function or related in PL/SQL, I should program it by myself. I code Split intermediate function which is used to get the token value between current splitter and next splitter. Next program is main entry point, it get each column value from page_two.multilist04, process each row based on cursor. When it get each multilist04 value, it uses above Split function to get each token string stored to singValue variant, then check if it exists in manufacturers.id. If not found, set fixFlag to 1, pending to be deleted.

    Read the article

  • Strings Generation

    - by sikas
    Hi,I would like to know how can I create various string from some given characters eg: given characters: a, b I would like to generate the following strings: aa ab ba bb What I have thought of is having (for 2 inputs only) two for-loops one inside another, and then loop each to the number of inputs which in this case is 2 and the output strings will be 2*2 = 4 strings and as the number increases the number of output strings will increase by multiplying n*n (n-times)

    Read the article

  • C# Function that generates strings according to input

    - by mouthpiec
    Hi, I need a C# function that takes 2 strings as an input and return an array of all possible combinations of strings. private string[] FunctionName (string string1, string string2) { //code } The strings input will be in the following format: String1 eg - basement String2 eg - **a*f**a Now what I need is all combinations of possible strings using the characters in String2 (ignoring the * symbols), and keeping them in the same character position. Eg: baaement, baaefent, baaefena, basefent, basemena, etc any help? :)

    Read the article

  • How to detect UTF-8-based encoded strings [closed]

    - by Diego Sendra
    A customer of asked us to build him a multi-language based support VB6 scraper, for which we had the need to detect UTF-8 based encoded strings to decode it later for proper displaying in application UI. It's necessary to point out that this need arises based on VB6 limitations to natively support UTF-8 in its controls, contrary to what it happens in .NET where you can tell a control that it should expect UTF-8 encoding. VB6 natively supports ISO 8859-1 and/or Windows-1252 encodings only, for which textboxes, dropdowns, listview controls, others can't be defined to natively support/expect UTF-8 as you can do in .NET considering what we just explained; so we would see weird symbols such as é, è among others, making it a whole mess at the time of displaying. So, next function contains whole UTF-8 encoded punctuation marks and symbols from languages like Spanish, Italian, German, Portuguese, French and others, based on an excellent UTF-8 based list we got from this link - Ref. http://home.telfort.nl/~t876506/utf8tbl.html Basically, the function compares if each and one of the listed UTF-8 encoded sentences, separated by | (pipe) are found in our passed string making a substring search first. Whether it's not found, it makes an alternative ASCII value based search to get a match. Say, a string like "Societé" (Society in english) would return FALSE through calling isUTF8("Societé") while it would return TRUE when calling isUTF8("SocietÈ") since È is the UTF-8 encoded representation of é. Once you got it TRUE or FALSE, you can decode the string through DecodeUTF8() function for properly displaying it, a function we found somewhere else time ago and also included in this post. Function isUTF8(ByVal ptstr As String) Dim tUTFencoded As String Dim tUTFencodedaux Dim tUTFencodedASCII As String Dim ptstrASCII As String Dim iaux, iaux2 As Integer Dim ffound As Boolean ffound = False ptstrASCII = "" For iaux = 1 To Len(ptstr) ptstrASCII = ptstrASCII & Asc(Mid(ptstr, iaux, 1)) & "|" Next tUTFencoded = "Ä|Ã…|Ç|É|Ñ|Ö|ÃŒ|á|Ã|â|ä|ã|Ã¥|ç|é|è|ê|ë|í|ì|î|ï|ñ|ó|ò|ô|ö|õ|ú|ù|û|ü|â€|°|¢|£|§|•|¶|ß|®|©|â„¢|´|¨|â‰|Æ|Ø|∞|±|≤|≥|Â¥|µ|∂|∑|âˆ|Ï€|∫|ª|º|Ω|æ|ø|¿|¡|¬|√|Æ’|≈|∆|«|»|…|Â|À|Ã|Õ|Å’|Å“|–|—|“|â€|‘|’|÷|â—Š|ÿ|Ÿ|â„|€|‹|›|ï¬|fl|‡|·|‚|„|‰|Â|Ú|Ã|Ë|È|Ã|ÃŽ|Ã|ÃŒ|Ó|Ô||Ã’|Ú|Û|Ù|ı|ˆ|Ëœ|¯|˘|Ë™|Ëš|¸|Ë|Ë›|ˇ" & _ "Å|Å¡|¦|²|³|¹|¼|½|¾|Ã|×|Ã|Þ|ð|ý|þ" & _ "â‰|∞|≤|≥|∂|∑|âˆ|Ï€|∫|Ω|√|≈|∆|â—Š|â„|ï¬|fl||ı|˘|Ë™|Ëš|Ë|Ë›|ˇ" tUTFencodedaux = Split(tUTFencoded, "|") If UBound(tUTFencodedaux) > 0 Then iaux = 0 Do While Not ffound And Not iaux > UBound(tUTFencodedaux) If InStr(1, ptstr, tUTFencodedaux(iaux), vbTextCompare) > 0 Then ffound = True End If If Not ffound Then 'ASCII numeric search tUTFencodedASCII = "" For iaux2 = 1 To Len(tUTFencodedaux(iaux)) 'gets ASCII numeric sequence tUTFencodedASCII = tUTFencodedASCII & Asc(Mid(tUTFencodedaux(iaux), iaux2, 1)) & "|" Next 'tUTFencodedASCII = Left(tUTFencodedASCII, Len(tUTFencodedASCII) - 1) 'compares numeric sequences If InStr(1, ptstrASCII, tUTFencodedASCII) > 0 Then ffound = True End If End If iaux = iaux + 1 Loop End If isUTF8 = ffound End Function Function DecodeUTF8(s) Dim i Dim c Dim n s = s & " " i = 1 Do While i <= Len(s) c = Asc(Mid(s, i, 1)) If c And &H80 Then n = 1 Do While i + n < Len(s) If (Asc(Mid(s, i + n, 1)) And &HC0) <> &H80 Then Exit Do End If n = n + 1 Loop If n = 2 And ((c And &HE0) = &HC0) Then c = Asc(Mid(s, i + 1, 1)) + &H40 * (c And &H1) Else c = 191 End If s = Left(s, i - 1) + Chr(c) + Mid(s, i + n) End If i = i + 1 Loop DecodeUTF8 = s End Function

    Read the article

  • Visual Studio Find and Replace Regular Expressions ~ find lines with quoted strings, not containing

    - by Darkoni
    Visual Studio Find and Replace Regular Expressions Find lines with quoted strings, not containing strings include or trace i am tryling to find out all lines in c++ project that contains some text as i have to use visual studio, i have to use its Find and Replace http://www.codinghorror.com/blog/2006/07/the-visual-studio-ide-and-regular-expressions.html so, for finding all lines like: print("abc"); it is enogh to write :q and it will find all quotted strings ok, but i also get lot of lines like #include "stdio.h" and trace("* step 1 *") i find out regex to get all lines containing include and trace <include|trace> so, mine question is, how to find all lines with "quotted strings" but NOT lines that contains strings include and trace. thanx

    Read the article

  • Invisible Delimiter for Strings in HTML

    - by noah
    I need a way to identify certain strings in HTML markup. I know what the strings are, but it is possible that they could be substrings of other strings in the document. To find them, I output a special delimiter character (currently using \032). On page load, we go through the HTML and record the location of the strings, and remove the delimiter. Unfortunately, most browsers show the delimiter character until we can find and remove them all. I'd like to avoid that if possible. Is there a character or string that will be preserved in the HTML content (so a comment wont work) but wont be visible to the user? It also needs to be something that is fairly unlikely to appear next to a string, so something like &nbsp; wouldn't work either. EDIT: Sorry, I forgot to mention that the strings will be in attributes, so any sort of tag wont work.

    Read the article

  • How can I detect common substrings in a list of strings

    - by danio
    Given a set of strings, for example: EFgreen EFgrey EntireS1 EntireS2 J27RedP1 J27GreenP1 J27RedP2 J27GreenP2 JournalP1Black JournalP1Blue JournalP1Green JournalP1Red JournalP2Black JournalP2Blue JournalP2Green I want to be able to detect that these are three sets of files: EntireS[1,2] J27[Red,Green]P[1,2] JournalP[1,2][Red,Green,Blue] Are there any known ways of approaching this problem - any published papers I can read on this? The approach I am considering is for each string look at all other strings and find the common characters and where differing characters are, trying to find sets of strings that have the most in common, but I fear that this is not very efficient and may give false positives. Note that this is not the same as 'How do I detect groups of common strings in filenames' because that assumes that a string will always have a series of digits following it. [Edited 15/09/09 to add more sample strings]

    Read the article

  • Comparing two strings in excel, add value for common variables

    - by overtime
    I'm comparing two large datasets containing strings in excel. Column A contains the numbers 1-1,000,000. Column B contains 1,000,000 strings, neatly organized in the desired order. Column C contains 100,000 randomly organized strings, that have identical values somewhere in column B. Example: A B C D 1 String1 String642 2 String2 String11 3 String3 String8000 4 String4 String78 What I'd like to do is find duplicate values in columns B and C then output the Column A value that corresponds with the string in Column C into Column D. Desired Output: A B C D 1 String1 String642 642 2 String2 String11 11 3 String3 String8000 8000 4 String4 String78 78

    Read the article

  • Glib::ustring and Japanese characters

    - by user294787
    Glib::ustring is supposed to work well with UTF8 but I have a problem when working with Japanese strings. If you compare those two strings, "???" and "???", using == operator or compare method, it will answer that those two strings are equals. I don't understand why. How Glib::ustring works ? The only way I found to get false to the comparison is to compare strings of different sizes. For example "?????" and "????". Very strange...

    Read the article

  • Perl strings internals

    - by n0rd
    How does perl strings represented internally? What encoding is used? How do I handle different encodings properly? I've been using perl for quite a long time, but it didn't include a lot of string handling in different encodings, and when I encountered a minor problem that had something to do with encodings I usually resorted to some shamanic actions. Until this moment I thought about perl strings as sequences of bytes, which did fit pretty well for my tasks. Now I need to do some processing of UTF-8 encoded file and here starts trouble. First, I read file into string like this: open(my $in, '<', $ARGV[0]) or die "cannot open file $ARGV[0] for reading"; binmode($in, ':utf8'); my $contents; { local $/; $contents = <$in>; } close($in); then simply print it: print $contents; And I get two things: a warning Wide character in print at <scriptname> line <n> and a garbage in console. So I can conclude that perl strings have a concept of "character" that can be "wide" or not, but when printed these "wide" characters are represented in console as multiple bytes, not as single "character". (I wonder now why did all my previous experience with binary files worked quite how I expected it to work without any "character" issues). Why then I see garbage in console? If perl stores strings as character in some known encoding, I don't think there is a big problem to find out console encoding and print text properly. (I use Windows, BTW). If perl stores strings as multibyte sequences (e.g. using same UTF-8 encoding), why is it done this way? From my C experience handling multibyte strings is PAIN.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >