Search Results

Search found 36172 results on 1447 pages for 'unicode string'.

Page 5/1447 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Split String in C# without delimiter (sort of)

- by Zach

Hi, I want to split a string in C#.NET that looks like this: string Letters = "hello"; and put each letter (h, e, l, l, o) into an array or ArrayList. I have no idea what to use as the delimiter in String.Split(delimiter). I can do it if the original string has commas (or anything else): string Letters = "H,e,l,l,o"; string[] AllLettersArray = Letters.Split(",".ToCharArray()); But I have no idea what to use in a case with (supposedly) no delimiter. Is there a special character like Environment.Newline? Thanks.

Read the article
String searching algorithm for Chinese characters.

- by Jack Low

There are Python code available for existing algorithms for normal string searching e.g. Boyer-Moore Algorithm. I am looking to use this on Chinese characters and it doesn't seem like the same implementation would work. What would I go about doing in order to make the algorithm work on Chinese characters? I am referring to this: http://en.literateprograms.org/Boyer-Moore_string_search_algorithm_(Python)#References

Read the article
Parse string to create a list of element

- by Nick

I have a string like this: "\r color=\"red\" name=\"Jon\" \t\n depth=\"8.26\" " And I want to parse this string and create a std::list of this object: class data { std::string name; std::string value; }; Where for example: name = color value = red What is the fastest way? I can use boost. EDIT: This is what i've tried: vector<string> tokens; split(tokens, str, is_any_of(" \t\f\v\n\r")); if(tokens.size() > 1) { list<data> attr; for_each(tokens.begin(), tokens.end(), [&attr](const string& token) { if(token.empty() || !contains(token, "=")) return; vector<string> tokens; split(tokens, token, is_any_of("=")); erase_all(tokens[1], "\""); attr.push_back(data(tokens[0], tokens[1])); } ); } But it does not work if there are spaces inside " ": like color="red 1".

Read the article
Is it safe to use random Unicode for complex delimiter sequences in strings?

- by ccomet

Question: In terms of program stability and ensuring that the system will actually operate, how safe is it to use chars like ¦, § or ‡ for complex delimiter sequences in strings? Can I reliable believe that I won't run into any issues in a program reading these incorrectly? I am working in a system, using C# code, in which I have to store a fairly complex set of information within a single string. The readability of this string is only necessary on the computer side, end-users should only ever see the information after it has been parsed by the appropriate methods. Because some of the data in these strings will be collections of variable size, I use different delimiters to identify what parts of the string correspond to a certain tier of organization. There are enough cases that the standard sets of ;, |, and similar ilk have been exhausted. I considered two-char delimiters, like ;# or ;|, but I felt that it would be very inefficient. There probably isn't that large of a performance difference in storing with one char versus two chars, but when I have the option of picking the smaller option, it just feels wrong to pick the larger one. So finally, I considered using the set of characters like the double dagger and section. They only take up one char, and they are definitely not going to show up in the actual text that I'll be storing, so they won't be confused for anything. But character encoding is finicky. While the visibility to the end user is meaningless (since they, in fact, won't see it), I became recently concerned about how the programs in the system will read it. The string is stored in one database, while a separate program is responsible for both encoding and decoding the string into different object types for the rest of the application to work with. And if something is expected to be written one way, is possibly written another, then maybe the whole system will fail and I can't really let that happen. So is it safe to use these kind of chars for background delimiters?

Read the article
Strange Behaviour with Unicode Characters in Windows

- by open_sourse

Ok, I do not know if this is a programming question, but it certainly is a technical one so I am asking it here. I was working on some internationalization stuff in my PHP code, and in order to ensure that my generated HTML shows up Unicode correctly based on the encoding and stuff I decided to add some Chinese text to my PHP page, which then echoes it into the browser to complete my test case. So I went into google and typed "Chinese", copied the first Chinese text that the search returned (which was ??/??). I then copied it into Notepad++ which is my editor, and to my surprise showed up as boxes similar to [][]/[][]. So I thought the encoding in Notepad++ was messed up and I changed the encoding to UTF-8 and UCS, neither worked. I did it fresh in a newly encoded file, still I got the boxes. The same content when I paste into Google and StackOverFlow (like I did in this posting) shows up correct Chinese! I even opened up Windows Clipboard Viewer and the content is represented in the Clipboard as boxes! I tried pasting it into Windows Explorer address bar and using to rename a file to, but I still get boxes. But it shows up correctly when pasted into my Chrome Browser address bar! Is this a Windows issue? Since I am able to paste it correctly in SO, the data in memory should be encoded correctly right? But if that is the case why does it show up as boxes in the Clipboard Viewer? I am confused here...By the way I am using Windows XP with SP3. (I am asking this question here, even if it is not programmatic, because it is preventing me from running my programming test cases..)

Read the article
Permanent fix for unicode characters not displaying correctly (as boxes)

- by Chase

Please read this entire message before replying. First I know how to fix the issue on a temporary basis. I am looking for a permanent fix. I work with foreign language files a lot. Unfortunately sometimes all the unicode characters in windows explorer, notepad, and other places (as rendered by windows, probably GDI) do not display correctly. That is they display as square blocks, where as they had just been displaying correctly. There are countless methods to temporarily correct the issue. But again, I want a way to permanently resolve the issue. What I have tried: The silly "Hide fonts based on language settings". This setting only applies to what fonts you see in the fonts folder and font dropdowns. It doesn't disable foreign fonts (doesn't work, or if it does, it is temporary). Deleting the font cache file and rebooting (works.. usually, temporary solution). Changing my locale and then back (sometimes works, temporary solution). Rebooting my PC and getting lucky (50-50 chance, temporary solution). Changing my keyboard input/adding foreign keyboard (temporary solution that only seems to work once). Reinstalling windows (temporary solution, sometimes lasts a few months though, I have done this 7 times across 3 computers) What I have not tried: Buying Windows Ultimate and installing the interface packs. This is not a solution. I can't read Japanese/Chinese and I do not want my interface in those languages. What I will not do: Switch to a different brand operating system (unix, linux, mac os x) Switch to an older version of windows (Windows Vista, XP, 2000, etc). So can anyone recommend a permanent fix for the problem?

Read the article
Why String.replaceAll() don't work on this String ?

- by Aloong

//This source is a line read from a file String src = "23570006,music,**,wu(),1,exam,\"Monday9,10(H2-301)\",1-10,score,"; //This sohuld be from a matcher.group() when Pattern.compile("\".*?\"") String group = "\"Monday9,10(H2-301)\""; src = src.replaceAll("\"", ""); group = group.replaceAll("\"", ""); String replacement = group.replaceAll(",", "#@"); System.out.println(src.contains(group)); src = src.replaceAll(group, replacement); System.out.println(group); System.out.println(replacement); System.out.println(src); I'm trying to replace the "," between \"s so I can ues String.split() latter. But the above just not working , the result is: true Monday9,10(H2-301) Monday9#@10(H2-301) 23570006,music,**,wu(),1,exam,Monday9,10(H2-301),1-10,score, but when I change the src string to String src = "123\"9,10\"123"; String group = "\"9,10\""; It works well true 9,10 9#@10 1239#@10123 What's the matter with the string???

Read the article
Truncate portions of a string to limit the whole string's length in Ruby

- by Horace Loeb

Suppose you want to generate dynamic page titles that look like this: "It was all a dream, I used to read word up magazine" from "Juicy" by The Notorious B.I.G I.e., "LYRICS" from "SONG_NAME" by ARTIST However, your title can only be 69 characters total and this template will sometimes generate titles that are longer. One strategy for solving this problem is to truncate the entire string to 69 characters. However, a better approach is to truncate the less important parts of the string first. I.e., your algorithm might look something like this: Truncate the lyrics until the entire string is <= 69 characters If you still need to truncate, truncate the artist name until the entire string is <= 69 characters If you still need to truncate, truncate the song name until the entire string is <= 69 characters If all else fails, truncate the entire string to 69 characters Ideally the algorithm would also limit the amount each part of the string could be truncated. E.g., step 1 would really be "Truncate the lyrics to a minimum of 10 characters until the entire string is <= 69 characters" Since this is such a common situation, I was wondering if someone has a library or code snippet that can take care of it.

Read the article
getting string.substring(N) not to choke when N > string.length

- by aape

I'm writing some code that takes a report from the mainframe and converts it to a spreadsheet. They can't edit the code on the MF to give me a delimited file, so I'm stuck dealing with it as fixed width. It's working okay now, but I need to get it more stable before I release it for testing. My problem is that in any given line of data, say it could have three columns of numbers, each five chars wide at positions 10, 16, and 22. If on this one particular row, there's no data for the last two cols, it won't be padded with spaces; rather, the length of the string will be only 14. So, I can't just blindly have dim s as string = someStream.readline a = s.substring(10, 5) b = s.substring(16, 5) c = s.substring(22, 5) because it'll choke when it substrings past the length of the string. I know I could test the length of the string before processing each row, and I have automated the filling of some of the vsariables using a counter and a loop, and using the counter*theWidthOfTheGivenVariable to jump around, but this project was a dog to start with (come on! turning a report into a spreadsheet?), but there are many different types of rows (it's not just a grid), and the code's getting ugly fast. I'd like this to be clean, clear, and maintainable for the poor sucker that gets this after me. If it matters, here's my code so far (it's really crufty at the moment). You can see some of my/its idiocy in the processSection#data subs So, I'm wondering 1) is there a way baked in to .NET to have string.substring not error when reading past the end of a string without wrapping it in a try...catch? and 2) would it be appropriate in this situation to write a new string class that inherits from string that has a more friendly substring function in it? ETA: Thanks for all the advice and knowledge everyone. I'll go with the extension. Hopefully one of these years, I'll get my chops up enough to pay someone back in kind. :)

Read the article
String.split() method bug in GWT 2.0.3

- by Domchi

I'm upgrading a GWT project from GWT 1.7.1 to currently newest version 2.0.3. It seems that new GWT broke String.split(String regex) method - I get the following error on the Javascript side: this$static is undefined This happens in this line of my .nocache.js file: if (maxMatch == 0 && this$static.length > 0) { ...which happens to be a part of String split method equivalent in Javascript. Is there a cure for this, apart from doing string splitting myself?

Read the article
How to remove all zeros from string's beginning ?

- by hsz

I have a string which is beginning with zeros: string s = "000045zxxcC648700"; How can I remove them so that string will look like: string s = "45zxxcC648700";

Read the article
String manipulation of type String substitution in mathematical expression

- by Peterstone

Imagine something like exp(49/200)+(x-49/200) I want to pass as argument of the function "roundn" whatever operation that is not a addtion or a subtraction So my expresion became roundn(exp(roundn(49/200,n)),n)+(x - roundn(49/200,n) Well the expression I want to manipulate is this: exp(49/200)+exp(49/200)*(x-49/200)+1/2*exp(49/200)*(x-49/200)^2+1/6*exp(49/200)*(x- 49/200)^3+1/24*exp(49/200)*(x-49/200)^4+1/120*exp(49/200)*(x-49/200)^5+1/720*exp(49/200)*(x-49/200)^6+1/5040*exp(49/200)*(x-49/200)^7+1/40320*exp(49/200)*(x-49/200)^8+1/362880*exp(49/200)*(x-49/200)^9+1/3628800*exp(49/200)*(x-49/200)^10+1/39916800*exp(49/200)*(x-49/200)^11 I´m looking for a method (That include whatever program) not based in lenguage programming, as much batch or somithing like that... Thank you!

Read the article
finding a string of random characters (with possible errors) within a large string of random charact

- by mike

I am trying to search a large string w/o spaces for a smaller string of characters. using regex I can easily find perfect matches but I can't figure out how to find partial matches. by partial matches i mean one or two extra characters in the string or one or two characters that have been changed, or one of each. the first and last characters will always match though. this would be similar to a spell checker but there are no spaces and the strings dont contain actual words, just random hex digits. i figured a way to find the string if there are no extra characters using indexOf(string.charAt(0)) and indexOf(charAt(string.length()-1) and looping through the characters between the two indexes. but this can be problematic when dealing with randomized characters because of the possibility of finding the first and last characters at the correct spacing but none of the middle characters matching. i've been scratching my head for hours on this issue. any ideas?

Read the article
Convert .net String object into base64 encoded string

- by chester89

I have a question, which Unicode encoding to use while encoding .NET string into base64? I know strings are UTF-16 encoded on Windows, so is my way of encoding is the right one? public static String ToBase64String(this String source) { return Convert.ToBase64String(Encoding.Unicode.GetBytes(source)); }

Read the article
Java string too long?

- by wrongusername

I have the following code in Java (which worked just fine in C++ for some reason) which produces an error: int a; System.out.println("String length: " + input.length()); for(a = 0; ((a + 1) * 97) < input.length(); a++) { System.out.print("Substring at " + a + ": "); System.out.println(input.substring(a * 97, 97)); //other code here... } Output: String length: 340 Substring at 0: HelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHe Substring at 1: Exception in thread "AWT-EventQueue-0" java.lang.StringIndexOutOfBoundsException: String index out of range: -97 //long list of "at ..." stuff Substring at 2: Using a string of length 200, however, the following output is produced: String length: 200 Substring at 0: HelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHelloHe Substring at 1: That is it; no exceptions raised, just... nothing. What is happening here?

Read the article
How to get unicodes from Google translation output string.

- by user270885

In google translate web site if i type any word in English and select any other foreign language, it show the exact word in the foreign language. I want the unicode value of that foreign characters. How to get that?

Read the article
Iterating through String word at a time in Python

- by AlgoMan

I have a string buffer of a huge text file. I have to search a given words/phrases in the string buffer. Whats the efficient way to do it ? I tried using re module matches. But As i have a huge text corpus that i have to search through. This is taking large amount of time. Given a Dictionary of words and Phrases. I iterate through the each file, read that into string , search all the words and phrases in the dictionary and increment the count in the dictionary if the keys are found. One small optimization that we thought was to sort the dictionary of phrases/words with the max number of words to lowest. And then compare each word start position from the string buffer and compare the list of words. If one phrase is found, we don search for the other phrases (as it matched the longest phrase ,which is what we want) Can some one suggest how to go about word by word in the string buffer. (Iterate string buffer word by word) ? Also, Is there any other optimization that can be done on this ?

Read the article
Parsing unicode XML with Python SAX on App Engine

- by Derek Dahmer

I'm using xml.sax with unicode strings of XML as input, originally entered in from a web form. On my local machine (python 2.5, using the default xmlreader expat, running through app engine), it works fine. However, the exact same code and input strings on production app engine servers fail with "not well-formed". For example, it happens with the code below: from xml import sax class MyHandler(sax.ContentHandler): pass handler = MyHandler() # Both of these unicode strings return 'not well-formed' # on app engine, but work locally xml.parseString(u"<a>b</a>",handler) xml.parseString(u"<!DOCTYPE a[<!ELEMENT a (#PCDATA)> ]><a>b</a>",handler) # Both of these work, but output unicode xml.parseString("<a>b</a>",handler) xml.parseString("<!DOCTYPE a[<!ELEMENT a (#PCDATA)> ]><a>b</a>",handler) resulting in the error: File "<string>", line 1, in <module> File "/base/python_dist/lib/python2.5/xml/sax/__init__.py", line 49, in parseString parser.parse(inpsrc) File "/base/python_dist/lib/python2.5/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/base/python_dist/lib/python2.5/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/base/python_dist/lib/python2.5/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/base/python_dist/lib/python2.5/xml/sax/handler.py", line 38, in fatalError raise exception SAXParseException: <unknown>:1:1: not well-formed (invalid token) Any reason why app engine's parser, which also uses python2.5 and expat, would fail when inputting unicode?

Read the article
'??' Not a valid unicode character, but in the unicode character set?

- by Steve Cotner

Short story: I can't get an entity like '𠂉' to store in a MySQL database, either by using a text field in a Ruby on Rails app (with default UTF-8 encoding) or by inputting it directly with a MySQL GUI app. As far as I can tell, all Chinese characters and radicals can be entered into the database without problem, but not these rarely typed 'character components.' The character mentioned above is unicode U+20089 and html entity 𠂉 I can get it to display on the page by entering <html>𠂉</html> and removing html escaping, but I would like to store it simply as the unicode character and keep the html escaping in place. There are many other Chinese 'components' (parts of full characters, generally consisting of 2 or 3 strokes) that cause the same problem. According to this page, the character mentioned is in the UTF-8 charset: http://www.fileformat.info/info/unicode/char/20089/charset_support.htm But on the neighboring '...20089/index.htm' page, there's an alert saying it's not a valid unicode character. For reference, that entity can be found in Mac OS X by searching through the character palette (international menu, "Show Character Palette"), searching by radical, and looking under the '?' radical. Apologies if this is too open-ended... can a character like this be stored in a UTF-8-based database? How is this character both supported and unsupported, both present in the character set and not valid?

Read the article
Why does printf report an error on all but three (ASCII-range) Unicode Codepoints, yet is fine with all others?

- by fred.bear

The 'printf' I refer to is the standard-issue "program" (not the built-in): /usr/bin/printf I was testing printf out as a viable method of convert a Unicode Codepoint Hex-literal into its Unicoder character representation, I was looking good, and seemed flawless..(btw. the built-in printf can't do this at all (I think)... I then thought to test it at the lower extreme end of the code-spectrum, and it failed with an avalanche of errors.. All in the ASCII range (= 7 bits) The strangest thing was that 3 value printed normally; they are: $ \u0024 @ \u0040 ` \u0060 I'd like to know what is going on here. The ASCII character-set is most definitely part of the Unicode Code-point sequence.... I am puzzled, and still without a good way to bash script this particular converion.. Suggestions are welcome. To be entertained by that same avalanche of errors, paste the following code into a terminal... # Here is one of the error messages # /usr/bin/printf: invalid universal character name \u0041 # ...for them all, run the following script ( for nib1 in {0..9} {A..F}; do for nib0 in {0..9} {A..F}; do [[ $nib1 < A ]] && nl="\n" || nl=" " $(type -P printf) "\u00$nib1$nib0$nl" done done echo )

Read the article
Add currently displayed string in textview to string array, then display last/previous string in tha

- by zaid

i want to add a currently displayed string in a textview to the last position in a pre-defined but empty string array. and then i want a button to display the last string in that string array, and if that button is clicked again it will go to the previous string and works its way up the array.

Read the article
SED and Unicode Quotation Marks

- by Jonathan Patt

When testing against this string: “… so that’s that… ” The following should, but does not, match the opening quotation mark and following ellipsis and space: sed "s/\([“‘\"']…\) /\1/g" However, this correctly matches the second ellipsis and following space and closing quotation mark: sed "s/… \([”’\"'.!?]\)/…\1/g" If I split the first apart it works fine: sed -e "s/\(“…\) /\1/g" \ -e "s/\(‘…\) /\1/g" \ -e "s/\(\"…\) /\1/g" \ -e "s/\('…\) /\1/g" So why doesn't it work when it's grouped together? Especially when it works fine with the closing quotation marks.

Read the article
notepad sql Unicode and Non Unicode

- by RBrattas

Hi, I have a Microsoft Notepad flate file with data and Vertical Bar as column delimiter. I get following message: cannot convert between unicode and non-unicode string data types It seems it is my nvarchar(max) that creates my problem. I changed to varchar(max); but still the same problem. And in the SQL Server 2005 import and export wizard the flate file source advanced tab the OutputColumnWith is 50. Will that say my flate file column is max 50? I hope not because my column is more then 50... Thank you, Rune

Read the article
notepad sql Unicode and Non Unicode

- by RBrattas

Hi, I have a Microsoft Notepad flate file with data and Vertical Bar as column delimiter. I get following message: cannot convert between unicode and non-unicode string data types It seems it is my nvarchar(max) that creates my problem. I changed to varchar(max); but still the same problem. How do I insert my flate file into my SQL Server 2005? And in the SQL Server 2005 import and export wizard the flate file source advanced tab the OutputColumnWith is 50. Will that say my flate file column is max 50? I hope not because my column is more then 50... Thank you, Rune

Read the article
How to concatenate two unicode characters in DotNet and not have any space?

- by OutOFTouch

When I concatenate the following two unicode characters I see both but there is a space between them. Is there anyway to get rid of this space? StringBuilder sb = new StringBuilder(); int characterCode; characterCode = Convert.ToInt32("2758", 16); sb.Append((char)characterCode); characterCode = Convert.ToInt32("25c4", 16); sb.Append((char)characterCode);

Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >