unicode string - Page 13

Problem with Replacing special characters in a string

- by Hossein

Hi, I am trying to feed some text to a special pupose parser. The problem with this parser is that it is sensitive to ()[] characters and in my sentence in the text have quite a lot of these characters. The manual for the parser suggests that all the ()[] get replaced with  \[ \]. So using str.replace i am using to attach \ to all of those charcaters. I use the code below: a = 'abcdef(1234)' a.replace('(','\(') however i get this as my output: 'abcdef\\(1234)' What is wrong with my code? can anyone provide me a solution to solve this for these characters?

Read the article

Removing unwanted characters from a string in Python

- by Hossein

Hi, I have some strings that I want to delete some unwanted characters from them. For example: Adam'sApple ---- AdamsApple.(case insensitive) Can someone help me, I need the fastest way to do it, cause I have a couple of millions of records that have to be polished. Thanks

Read the article

Replacing substring in a string

- by user177785

I am uploading a image file to the server. Now after uploading the file to the server I need to rename the file with an id, but the extension of the file should be retained. Eg: if I upload the file image1.png then my server script should retain the extension .png. But I need to change the substring to some other substring (primary key of db). image1.png should be renamed to 123.png image2.jpg should be renamed to somevalue.jpg The image can be of any extension like .png, .jpg, .jpeg etc. I want to rename then in such a way that the image/file extension should be retained.

Read the article

A better way of converting Codepage-1251 in RTF to Unicode

- by blue painted

I am trying to parse RTF (via MSEDIT) in various languages, all in Delphi 2010, in order to produce HTML in unicode. Taking Russian/Cyrillic as my starting point I find that the overall document codepage is 1252 (Western) but the Russian parts of the text are identified by the charset of the font (RUSSIAN_CHARSET 204). So far I am: 1) Use AnsiString (or RawByteString) when parsing the RTF 2) Determine the CodePage by a lookup from the font charset (see http://msdn.microsoft.com/en-us/library/cc194829.aspx) 3) Translating using a lookup table in my code: (This table generated from http://msdn.microsoft.com/en-gb/goglobal/cc305144.aspx) - I'm going to need one table per supported codepage! There MUST be a better way than this? Preferably something supplied by the OS and so less brittle than tables of constants.

Read the article

Python MQTT: TypeError: coercing to Unicode: need string or buffer, bool found

- by user2923860

When my python code tries to connect to the MQTT broker it gives me this Type Error: Update- I added the Complete Error Traceback (most recent call last): File "test.py", line 20, in <module> mqttc.connect(broker, 1883, 60, True) File "/usr/local/lib/python2.7/dist-packages/mosquitto.py", line 563, in connect return self.reconnect() File "/usr/local/lib/python2.7/dist-packages/mosquitto.py", line 632, in reconnect self._sock = socket.create_connection((self._host, self._port), source_address=(self._bind_address, 0)) File "/usr/lib/python2.7/socket.py", line 561, in create_connection sock.bind(source_address) File "/usr/lib/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) TypeError: coercing to Unicode: need string or buffer, bool found The code of the python file is: #! /usr/bin/python import mosquitto broker = "localhost" #define what happens after connection def on_connect(rc): print "Connected" #On recipt of a message do action def on_message(msg): n = msg.payload t = msg.topic if t == "/test/topic": if n == "test": print "test message received" # create broker mqttc = mosquitto.Mosquitto("python_sub") #define callbacks mqttc.on_message = on_message mqttc.on_connect = on_connect #connect mqttc.connect(broker, 1883, 60, True) #Subscribe to topic mqttc.subscribe("/test/topic", 2) #keep connected while mqttc.loop() == 0: pass I have no idea why its giving me this it work 2 days ago.

Read the article

Unicode characters in URLs

- by Pekka

In 2010, would you serve URLs containing UTF-8 characters in a large web portal? Unicode characters are forbidden as per the RFC on URLs (see here). They would have to be percent encoded to be standards compliant. My main point, though, is serving the unencoded characters for the sole purpose of having nice-looking URLs, so percent encoding is out. All major browsers seem to be parsing those URLs okay no matter what the RFC says. My general impression, though, is that it gets very shaky when leaving the domain of web browsers: URLs getting copy+pasted into text files, E-Mails, even Web sites with a different encoding HTTP Client libraries Exotic browsers, RSS readers Is my impression correct that trouble is to be expected here, and thus it's not a practical solution (yet) if you're serving a non-technical audience and it's important that all your links work properly even if quoted and passed on? Is there some magic way of serving nice-looking URLs in HTML http://www.example.com/düsseldorf?neighbourhood=Lörick that can be copy+pasted with the special characters intact, but work correctly when re-used in older clients?

Read the article

Multilangual Unicode rendering in opengl

- by sum1stolemyname

Hi Folks, I have to extend an OpenGL-Rendering System to support international characters (especially Hebrew, Arabic and cyrillic). Development Platform is Windows(XP|Vista|7), Alas using Embercardero Delphi 2010. I currently use wglOutLineFont(...) to build my font's display list and glCallLists(length(m_Text), UNSIGNED_SHORT, PWchar(m_Text) ) to render my strings. While this is feasable for Latin-1 Characters, building the full unicode character set in advanced is pretty time-consuming (about 8.5 minutes on my machine), so i am looking for a more efficient solution. I thought about limiting the range from u+0020 - u+077f (latin, greek, cyrillic, arbaic and hebrew) to include just the glyphs i need, but that would just be a solution for my current needs, and will become insufficent once other encoding is needed. On the upside, i do not have to worry about left-to right or right-to left direction as our application can handle this already. I would expect this to be a well-known problem, so i would like to ask if there is any reference material on this on the web, or if you could share some insight on this?

Read the article

pyODBC and Unicode Problem

- by Aviv Giladi

Hey guys, I'm working with pyODBC communicate with a MS SQL 2005 Express server. The table to which i'm trying to save the data consists of nvarchar columns. query = u"INSERT INTO tblPersons (name, birthday, gender) VALUES('" query = query + name + u"', '" query = query + birthday + u"', '" query = query + gender + u"')" cur.execute(query ) The variables name, birthrday and gende are read from an Excel file and they are Unicode strings. When I execute the query and either look at the table with SQL Server Management Studio or execute a query that fetches the data that was just inserted, all the data that was written in a non-English languages turn into question marks. The data that was written in English is preserved and appears in the table in the correct way. I tried adding CHARSET=UTF16 to my connection string, but had no luck with that. I can use UTF-8 which works fine but as a working convention, I need all the data saved in my DB to be UTF16. Thanks!

Read the article

String Field Sizes for unicode database fields using different data access components

- by Serg

mjustin in his question 1 and question 2 says that TWideStringField.Size property for UTF8 fields in Delphi 2009 dbExpress is 4 times larger than the logical field size (max number of characters in the field). I inclined to consider this a dbExpress bug. That is what Delphi 2009 Help says: The interpretation of Size depends on the data type. The meaning of Size for data types that use it is given in the following table. For all other data types, Size is not used and its value is always 0. ftString - Size is the maximum number of characters in the string. I am using FibPlus 6.9.9 and it follows the above documentation - the string field size is the maximum number of characters, not bytes. So the question also implies the following question: Are DbExpress drivers in Delphi 2009 unusable for unicode databases?

Read the article

exceptions with python unicode encode/decode functions (why doesn't errors=ignore actually ignore th

- by gatoatigrado

Does anyone know why the string conversion functions throw exceptions when errors="ignore" is passed? How can I convert from regular Python string objects to unicode without errors being thrown? Thanks very much! python -c "import codecs; codecs.open('tmp', 'wb', encoding='utf8', errors='ignore').write('?????')" returns Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.6/codecs.py", line 686, in write return self.writer.write(data) File "/usr/lib/python2.6/codecs.py", line 351, in write data, consumed = self.encode(object, self.errors) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

Read the article

String to array or Array to string tips on formats, etc

- by user316841

hi, first of all thanks for taking your time! I'm a junior Dev, working with PHP + mysql. My issue: I'm saving data from a form to my database. From this form, there's only need to save the contacts: Name, phone number, address. But, it would be nice to have a small reference to the user answers. Let's say for each question we've got a value betwee 1 and 4. Since there's no need to create a table just for it, because what's needed is just the personal contacts. I'm thinking of recording each question/answer, as a letter and its correspondent value. Example (A2, B1, C5, D3, etc). My question is: Is there a format I could afterwards, handle easily ? Convert to array (string to array) in case the client change ideas, and ask this data, placed in table columns ? Just to prevent this situation! Example, From (A2, B1, C5 ) to array( "A" = "1", "B" = "1", "C" = "5" ) For now I guess, Regex is the answer, but it's allways hard to figure it out and I'm allways getting in troubles =) Thanks!

Read the article

Unicode in fpc doesn't work

- by user1546454

Hi I'm Romanian and I can't write Unicode in Free Pascal Compiler. I try to write ?,î,â,a,? and it doesn't work. I tried with dos windows changed fonts, tried chcp. I even made a batch file which would do chcp 65001 and start the app. By default when I tried to write these letter I got "?" but when I started it with the batch file it just didn't write anything. I tried AnsiString, UnicodeString, UTF8String and all didn't work. And I think the problem is in the compiler. Does anyone know a solution?

Read the article

Intra-Unicode "lean" Encoding Converters

- by Mystagogue

Windows provides encoding conversion functions ("MultiByteToWideChar" and "WideCharToMultiByte") which are capable of UTF-8 to/from UTF-16 conversions, among other things. But I've seen people offer home-grown 30 to 40 line functions that claim also to perform UTF-8 / UTF-16 encoding conversions. My question is, how reliable are such tiny converters? Can such a tiny amount of code handle problems such as converting a UTF-16 surrogate pair (such as ) into a UTF-8 single four byte sequence (rather than making the mistake of converting into a pair of three byte sequences)? Can they correctly spot "unpaired" surrogate input, and provide an error? In short, are such tiny converters mere toys, or can they be taken seriously? For that matter, why does unicode.org seemingly offer no advice on an algorithm for accomplishing such conversions?

Read the article

Converting from ANSI to Unicode

- by Rayne

Hi all, I'm using Visual Studio .NET 2003, and I'm trying to convert a program written in purely ANSI characters to be independent of Unicode/Multi-byte characters. The program has a callback function of pcap_loop, called "got_packet". It's defined as void got_packet(u_char *user, const struct pcap_pkthdr *header, const u_char *cpacket) { USES_CONVERSION; _TUCHAR *packet; packet = A2T(cpacket); ... } However, I get the error message error C2440: 'type cast': cannot convert from 'const u_char *' to 'ATL::CA2WEX<>' How do fix this? Thank you. Regards, Rayne

Read the article

Characters in string changed after downloading HTML from the internet.

- by Callum Rogers

Using the following code, I can download the HTML of a file from the internet: WebClient wc = new WebClient(); // .... string downloadedFile = wc.DownloadString("http://www.myurl.com/"); However, sometimes the file contains "interesting" characters like é to Ã©, ? to â† and ????? to ãƒ•ã‚·ã‚®ãƒ€ãƒ. I think it may be something to do with different unicode types or something, as each character gets changed into 2 new ones, perhaps each character being split in half but I have very little knowledge in this area. What do you think is wrong?

Read the article

Unicode escape characters not being read by XmlReader

- by craigmoliver

I've got an XML document that I'm importing into an XmlReader that has some unicode formatting I need to preserve. I'm preserving the whitespace but it's dropping the encoded #x2028 which I assume should be expressed as a line break. Here's my code: var settings = new XmlReaderSettings { ProhibitDtd = false, XmlResolver = null, IgnoreWhitespace = false }; var reader = XmlReader.Create(new StreamReader(fu.PostedFile.InputStream), settings); var document = new XmlDocument {PreserveWhitespace = true}; document.Load(reader); return document;

Read the article

Unicode identifiers in Python?

- by viksit

Hi all, I want to build a Python function that calculates, and would like to name my summation function S. In a similar fashion, would like to use ? for product, and so on. I was wondering if there was a way to name a python function in this fashion? def S (..): .. .. That is, does Python support unicode identifiers, and if so, could someone provide an example for it? Thanks! Original motivation for this was a piece of Clojure code I saw today that looks like, (defn entropy [X] (* -1 (S [i X] (* (p i) (log (p i)))))) where S is a macro defined as, (defmacro S ... ) and I thought that was pretty cool.

Read the article

Unicode at IIS 7 on Windows 2008 Server SP2

- by Yuri

Hello, I have simple page in php which gets argument with get method. The page just prints the argument. Nothing more. It works properly with english chars. If i pass as argument value in some Unicode language (etc Russian), then the value of the argument printed as question marks. How to solve the issue? Thank you, Yuri P.S. adding header with utf-8 doesn't help. this is the get: mypage.php?src=uploaded_files/????.mp3 this is the encoding: <meta http-equiv="content-type" content="text/html; charset=utf-8" and this is the output: uploaded_files/????.mp3

Read the article

Regex and unicode

- by dbr

I have a script that parses the filenames of TV episodes (show.name.s01e02.avi for example), grabs the episode name (from the www.thetvdb.com API) and automatically renames them into something nicer (Show Name - [01x02].avi) The script works fine, that is until you try and use it on files that have Unicode show-names (something I never really thought about, since all the files I have are English, so mostly pretty-much all fall within [a-zA-Z0-9'\-]) How can I allow the regular expressions to match accented characters and the likes? Currently the regex's config section looks like.. config['valid_filename_chars'] = """0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!@£$%^&*()_+=-[]{}"'.,<>`~? """ config['valid_filename_chars_regex'] = re.escape(config['valid_filename_chars']) config['name_parse'] = [ # foo_[s01]_[e01] re.compile('''^([%s]+?)[ \._\-]\[[Ss]([0-9]+?)\]_\[[Ee]([0-9]+?)\]?[^\\/]*$'''% (config['valid_filename_chars_regex'])), # foo.1x09* re.compile('''^([%s]+?)[ \._\-]\[?([0-9]+)x([0-9]+)[^\\/]*$''' % (config['valid_filename_chars_regex'])), # foo.s01.e01, foo.s01_e01 re.compile('''^([%s]+?)[ \._\-][Ss]([0-9]+)[\.\- ]?[Ee]([0-9]+)[^\\/]*$''' % (config['valid_filename_chars_regex'])), # foo.103* re.compile('''^([%s]+)[ \._\-]([0-9]{1})([0-9]{2})[\._ -][^\\/]*$''' % (config['valid_filename_chars_regex'])), # foo.0103* re.compile('''^([%s]+)[ \._\-]([0-9]{2})([0-9]{2,3})[\._ -][^\\/]*$''' % (config['valid_filename_chars_regex'])), ]

Read the article

Convert char array to UNICODE in MFC C++

- by chathuradd

I'm using the folowing code to read files from a folder in windows. However since this a MFC application I have to convert the char array to UNICODE. For example if I hard code the path as "C:\images3\test\" as shown below the code works. WIN32_FIND_DATA FindFileData; HANDLE hFind = INVALID_HANDLE_VALUE; hFind = FindFirstFile(_T("C:\images3\test\"), &FindFileData); What I want is to get this working as follows: char* pathOfFileType; hFind = FindFirstFile(_T(pathOfFileType), &FindFileData); Can anyone tell me how to fix this problem ? Thanks

Read the article

C++ unicode UTF-16 encoding

- by Dan

Hi all, I have a wide char string is L"hao123--??????", and it must be encoded to "hao123--\u6211\u7684\u4E0A\u7F51\u4E3B\u9875". I was told that the encoded string is a special “%uNNNN” format for encoding Unicode UTF-16 code points. In this website(http://rishida.net/tools/conversion/), it tell me it's JavaScript escapes. But I don't know how to encode it with C++. It that any library to do this work? or give me some tips. Thanks my friends!

Read the article

Using Python simplejson for transmitting JSON to another server results in unicode encoding problems

- by Mark

Hi there, I'm encoding a string with Python's simplejson library with special characters: hello testing spécißl characters plusses: +++++ special chars :œ?´®†¥¨ˆøp“ß?ƒ©??°¬O˜çv?˜µ== However, when I encode it and transmit it to the other machine (using POST), it turns out like this: {'message': ['{"body": "hello testing sp\\u00e9ci\\u00dfl characters\\n\\nplusses: \\n\\nspecial chars :\\u0153\\u2211\\u00b4\\u00ae\\u2020\\u00a5\\u00a8\\u02c6\\u00f8\\u03c0\\u201c\\u00df\\u2202\\u0192\\u00a9\\u02d9\\u2206\\u02da\\u00ac\\u03a9\\u2248\\u00e7\\u221a\\u222b\\u02dc\\u00b5\\u2264\\u2265"}']} The + signs are completely stripped and the rest are in this unicode(?) format. My code for this is: data = {'body': data_string} data_encoded = json.dumps(data) Any ideas? Thanks! Edit: I've tried using json.dumps(data, ensure_ascii=False) but it results in a UnicodeError ordinal not in range error.

Read the article

How to diagnose, and reverse (not prevent) Unicode mangling

- by Steve Bennett

Somewhere upstream of me, "something" happened that looks like unicode mangling. One symptom is that a lowercase u umlaut (ü) gets converted to "Ã¼" (ie, character FC gets converted to C3 BC). Assuming that I have no control over this upstream process, how can I reverse-engineer what's going on? And if that is possible, can I crank the sausage machine backwards and get the original text back? (If it helps to understand this case, the text I received was in the form of a MySQL dump. I think somwewhere in the dump/transport process it got mangled.)

Read the article

c++ unicode writing is not working

- by Jugal Kishore

I am trying to write some Russian unicode text in file by wfstream. Following piece of code has been used for it. wfstream myfile; locale AvailLocale("Russian"); myfile.imbue(AvailLocale); myfile.open(L"d:\\example.txt",ios::out); if (myfile.is_open()) { myfile << L"?????? ????" <<endl; } myfile.flush(); myfile.close(); Something unrecognizable is written to the file by executing this code, I am using VS 2008.

Read the article

SQL SERVER – Find First Non-Numeric Character from String

- by pinaldave

It is fun when you have to deal with simple problems and there are no out of the box solution. I am sure there are many cases when we needed the first non-numeric character from the string but there is no function available to identify that right away. Here is the quick script I wrote down using PATINDEX. The function PATINDEX exists for quite a long time in SQL Server but I hardly see it being used. Well, at least I use it and I am comfortable using it. Here is a simple script which I use when I have to identify first non-numeric character. -- How to find first non numberic character USE tempdb GO CREATE TABLE MyTable (ID INT, Col1 VARCHAR(100)) GO INSERT INTO MyTable (ID, Col1) SELECT 1, '1one' UNION ALL SELECT 2, '11eleven' UNION ALL SELECT 3, '2two' UNION ALL SELECT 4, '22twentytwo' UNION ALL SELECT 5, '111oneeleven' GO -- Use of PATINDEX SELECT PATINDEX('%[^0-9]%',Col1) 'Position of NonNumeric Character', SUBSTRING(Col1,PATINDEX('%[^0-9]%',Col1),1) 'NonNumeric Character', Col1 'Original Character' FROM MyTable GO DROP TABLE MyTable GO Here is the resultset: Where do I use in the real world – well there are lots of examples. In one of the future blog posts I will cover that as well. Meanwhile, do you have any better way to achieve the same. Do share it here. I will write a follow up blog post with due credit to you. Reference : Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Function, SQL Query, SQL Server, SQL String, SQL Tips and Tricks, T SQL, Technology

Search Results

Search found 36172 results on 1447 pages for 'unicode string'.

Page 13/1447 | < Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >

- by Hossein

- by Hossein

- by user177785

- by blue painted

- by user2923860

- by Pekka

- by sum1stolemyname

- by Aviv Giladi

- by Serg

- by gatoatigrado

- by user316841

- by user1546454

- by Mystagogue

- by Rayne

- by Callum Rogers

- by craigmoliver

- by viksit

- by Yuri

- by dbr

- by chathuradd

- by Dan

- by Mark

- by Steve Bennett

- by Jugal Kishore

- by pinaldave

< Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >