unicode normalization - Page 3

Converting to and from Unicode in PHP

- by Chris

Hey, I'm using php 5 and need to communicate with another server that runs completely in unicode. I need to convert every string to unicode before sending it over. This seems like an easy task, but I haven't been able to find a way to do it yet. Is there a simple function that returns a unicode string? i.e. convert_to_unicode("the string i'm sending")

Read the article

Python unicode issues (2.6)

- by ephemeralis

I'm currently working on a irc bot for a multi-lingual channel, and I'm encountering some issues with unicode which are proving nearly impossible to solve. No matter what configuration of unicode encoding I seem to try, the list function which the below code sits within just flat out does nothing (c.notice is a class function which sends a NOTICE command to the irc server) or when it does do something, spits out something which obviously isn't encoded. The command should be sending ??, but instead it seems hellbent on sending å¤©å with a previous configuration of the same commands. The one I have specified below is of the 'send nothing' variety. I haven't worked with unicode before this, and thus I am quite stuck. I'm also positive that I'm doing this completely wrong as a consequence. (compileCMD just takes a list and spits out a single string of all the elements within the list) uk = self.compileCMD(self.faq.keys(),0) ukeys = unicode(uk,"utf-8").encode("utf-8") c.notice(nick, u"Current list of faq entries: %s" % (uk))

Read the article

json specifies "any UNICODE character"?

- by bukzor

Maybe this is just my unfamiliarity with unicode, so please correct me if I'm mistaken. Looking at http://json.org/, the spec says that a string can include "any UNICODE character", but this confuses me. JSON is a communication format correct? At the core of it, everything must translate down to bytes. In contrast, UNICODE is a logical format and must be encoded to be able to transmit it, right? So what did they mean there?

Read the article

Unicode and PHP - am I doing something wrong?

- by alex

I'm using Kohana 3, which has full support for Unicode. I have this as the first child of my <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> The Unicode character I am inserting into is é as in Café. However, I am getting the triangle with a ? (as in could not decode character). As far as I can tell in my own code, I am not doing any string manipulation on the text. In fact, I have placed the accent straight into a view's PHP file and it is still not working. I copied the character from this page: http://www.fileformat.info/info/unicode/char/00e9/index.htm I've only just started examining PHP's Unicode limitations, so I could be doing something horribly wrong. So, how do I display this character? Do I need to resort to the HTML entity?

Read the article

How can I convert a string into unicode string in Perl

- by SCNCN2010

how convert string into Unicode string in Perl. I am looking some attribute in LDAP which accepts only Unicode string . So i want to convert normal string to Unicode string

Read the article

Unicode string handling using Windows API

- by DeadMG

I always assumed that Unicode string handling was some dark art. However, I've seen that the Windows API has functions for comparing Unicode strings, for example. Does that mean that it's actually feasible to write a Unicode string class that can perform simple actions like sorting, equality comparison, and extraction from a file? Or are there hidden gotchas in the use of these functions that makes it actually a really bad idea? I'm just looking at libraries like ICU and they seem incredibly over-complicated compared to what a Unicode string class backed by the Windows API could actually look like, which would resemble the Standard string classes quite closely.

Read the article

Unicode for displaying mathematical operations

- by jax

How can I display a squared sign in unicode (I checked the Unicode reference and it is not there)? Also, is it possible to use unicode to display a fraction, for example 3/4 would look as it should with the horizontal vinculum?

Read the article

What issues lead people to use Japanese-specific encodings rather than Unicode?

- by Nicolas Raoul

At work I come across a lot of Japanese text files in Shift-JIS and other encodings. It causes many mojibake (unreadable character) problems for all computer users. Unicode was intended to solve this sort of problem by defining a single character set for all languages, and the UTF-8 serialization is recommended for use on the Internet. So why doesn't everybody switch from Japanese-specific encodings to UTF-8? What issues with or disadvantages of UTF-8 are holding people back? EDIT: The W3C lists some known problems with Unicode, could this be a reason too?

Read the article

Strange Behaviour with Unicode Characters in Windows

- by open_sourse

Ok, I do not know if this is a programming question, but it certainly is a technical one so I am asking it here. I was working on some internationalization stuff in my PHP code, and in order to ensure that my generated HTML shows up Unicode correctly based on the encoding and stuff I decided to add some Chinese text to my PHP page, which then echoes it into the browser to complete my test case. So I went into google and typed "Chinese", copied the first Chinese text that the search returned (which was ??/??). I then copied it into Notepad++ which is my editor, and to my surprise showed up as boxes similar to [][]/[][]. So I thought the encoding in Notepad++ was messed up and I changed the encoding to UTF-8 and UCS, neither worked. I did it fresh in a newly encoded file, still I got the boxes. The same content when I paste into Google and StackOverFlow (like I did in this posting) shows up correct Chinese! I even opened up Windows Clipboard Viewer and the content is represented in the Clipboard as boxes! I tried pasting it into Windows Explorer address bar and using to rename a file to, but I still get boxes. But it shows up correctly when pasted into my Chrome Browser address bar! Is this a Windows issue? Since I am able to paste it correctly in SO, the data in memory should be encoded correctly right? But if that is the case why does it show up as boxes in the Clipboard Viewer? I am confused here...By the way I am using Windows XP with SP3. (I am asking this question here, even if it is not programmatic, because it is preventing me from running my programming test cases..)

Read the article

Permanent fix for unicode characters not displaying correctly (as boxes)

- by Chase

Please read this entire message before replying. First I know how to fix the issue on a temporary basis. I am looking for a permanent fix. I work with foreign language files a lot. Unfortunately sometimes all the unicode characters in windows explorer, notepad, and other places (as rendered by windows, probably GDI) do not display correctly. That is they display as square blocks, where as they had just been displaying correctly. There are countless methods to temporarily correct the issue. But again, I want a way to permanently resolve the issue. What I have tried: The silly "Hide fonts based on language settings". This setting only applies to what fonts you see in the fonts folder and font dropdowns. It doesn't disable foreign fonts (doesn't work, or if it does, it is temporary). Deleting the font cache file and rebooting (works.. usually, temporary solution). Changing my locale and then back (sometimes works, temporary solution). Rebooting my PC and getting lucky (50-50 chance, temporary solution). Changing my keyboard input/adding foreign keyboard (temporary solution that only seems to work once). Reinstalling windows (temporary solution, sometimes lasts a few months though, I have done this 7 times across 3 computers) What I have not tried: Buying Windows Ultimate and installing the interface packs. This is not a solution. I can't read Japanese/Chinese and I do not want my interface in those languages. What I will not do: Switch to a different brand operating system (unix, linux, mac os x) Switch to an older version of windows (Windows Vista, XP, 2000, etc). So can anyone recommend a permanent fix for the problem?

Read the article

mysql database normalization question

- by Chocho

here is my 3 tables: table 1 -- stores user information and it has unique data table 2 -- stores place category such as, toronto, ny, london, etc hence this is is also unique table 3 -- has duplicate information. it stores all the places a user have been. the 3 tables are linked or joined by these ids: table 1 has an "table1_id" table 2 has an "table2_id" and "place_name" table 3 has an "table3_id", "table1_id", "place_name" i have an interface where an admin sees all users. beside a user is "edit" button. clicking on that edit button allows you to edit a specific user in a form fields which has a multiple drop down box for "places". if an admin edits a user and add 1 "places" for the user, i insert that information using php. if the admin decides to deselect that 1 "places" do i delete it or mark it as on and off? how about if the admin decides to select 2 "places" for the user; change the first "places" and add an additional "places". will my table just keep growing and will i have just redundant information? thanks.

Read the article

Normalization of database for timesheet tool and ensure data integrity

- by fireeyedboy

I'm creating a timesheet application. I have the following entities (amongst others): Company Employee = an employee associated with a company Client = a client associated with a company So far I have the following (abbreviated) database setup: Company - id - name Employee - id - companyId (FK to Company.id) - name Client - id - companyId (FK to Company.id) - name Now, I want an employee to be associated with a client, but only if that client is associated with the company the employee works for. How would you guarantee this data integrity on a database level? Or should I just depend on the application to guarantee this data integrity? I thought about creating a many to many table like this: EmployeeClient - employeeId (FK to Employee.id) - companyId \ (combined FK to Client.companyId, Client.id) - clientId / Thus, when I insert a client for an employee along with the employee's company id, the database should prevent this when the client is not associated with the employee's company id. Does this make sense? Because this still doesn't guarantee the employee is associated with the company. How do you deal with these things? UPDATE The scenario is as followed: A company has multiple employees. Employees will only be linked to one company. A company has multiple clients also. Clients will only be linked to one company. (Company is a sandbox, so to speak). An employee of a company can be linked to a client of it's company, but only if the client is part of the company's clientele. In other words: The application will allow a company to create/add employees and create/add clients (hence the companyId FK in the Employee and Client tables). Next, the company will be allowed to assign certain clients to certain of it's employees (EmployeeClient table). Imagine an employee working on projects for a few clients for which s/he can write billable hours, but the employee must not be allowed to write billable hours for clients they are not assigned to by their employer (the company). So, employees will not automatically have access to all their company's clients, but only to those that the company has selected for them. Hopefully this has shed some more light on the matter.

Read the article

MySQL Normalization stored procedure performance

- by srkiNZ84

Hi, I've written a stored procedure in MySQL to take values currently in a table and to "Normalize" them. This means that for each value passed to the stored procedure, it checks whether the value is already in the table. If it is, then it stores the id of that row in a variable. If the value is not in the table, it stores the newly inserted value's id. The stored procedure then takes the id's and inserts them into a table which is equivalent to the original de-normailized table, but this table is fully normalized and consists of mainly foreign keys. My problem with this design is that the stored procedure takes approximately 10ms or so to return, which is too long when you're trying to work through some 10million records. My suspicion is that the performance is to do with the way in which I'm doing the inserts. i.e. INSERT INTO TableA (first_value) VALUES (argument_from_sp) ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id); SET @TableAId = LAST_INSERT_ID(); The "ON DUPLICATE KEY UPDATE" is a bit of a hack, due to the fact that on a duplicate key I don't want to update anything but rather just return the id value of the row. If you miss this step though, the LAST_INSERT_ID() function returns the wrong value when you're trying to run the "SET ..." statement. Does anyone know of a better way to do this in MySQL? Thank you

Read the article

Parsing unicode XML with Python SAX on App Engine

- by Derek Dahmer

I'm using xml.sax with unicode strings of XML as input, originally entered in from a web form. On my local machine (python 2.5, using the default xmlreader expat, running through app engine), it works fine. However, the exact same code and input strings on production app engine servers fail with "not well-formed". For example, it happens with the code below: from xml import sax class MyHandler(sax.ContentHandler): pass handler = MyHandler() # Both of these unicode strings return 'not well-formed' # on app engine, but work locally xml.parseString(u"<a>b</a>",handler) xml.parseString(u"<!DOCTYPE a[<!ELEMENT a (#PCDATA)> ]><a>b</a>",handler) # Both of these work, but output unicode xml.parseString("<a>b</a>",handler) xml.parseString("<!DOCTYPE a[<!ELEMENT a (#PCDATA)> ]><a>b</a>",handler) resulting in the error: File "<string>", line 1, in <module> File "/base/python_dist/lib/python2.5/xml/sax/__init__.py", line 49, in parseString parser.parse(inpsrc) File "/base/python_dist/lib/python2.5/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/base/python_dist/lib/python2.5/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/base/python_dist/lib/python2.5/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/base/python_dist/lib/python2.5/xml/sax/handler.py", line 38, in fatalError raise exception SAXParseException: <unknown>:1:1: not well-formed (invalid token) Any reason why app engine's parser, which also uses python2.5 and expat, would fail when inputting unicode?

Read the article

'??' Not a valid unicode character, but in the unicode character set?

- by Steve Cotner

Short story: I can't get an entity like '𠂉' to store in a MySQL database, either by using a text field in a Ruby on Rails app (with default UTF-8 encoding) or by inputting it directly with a MySQL GUI app. As far as I can tell, all Chinese characters and radicals can be entered into the database without problem, but not these rarely typed 'character components.' The character mentioned above is unicode U+20089 and html entity 𠂉 I can get it to display on the page by entering <html>𠂉</html> and removing html escaping, but I would like to store it simply as the unicode character and keep the html escaping in place. There are many other Chinese 'components' (parts of full characters, generally consisting of 2 or 3 strokes) that cause the same problem. According to this page, the character mentioned is in the UTF-8 charset: http://www.fileformat.info/info/unicode/char/20089/charset_support.htm But on the neighboring '...20089/index.htm' page, there's an alert saying it's not a valid unicode character. For reference, that entity can be found in Mac OS X by searching through the character palette (international menu, "Show Character Palette"), searching by radical, and looking under the '?' radical. Apologies if this is too open-ended... can a character like this be stored in a UTF-8-based database? How is this character both supported and unsupported, both present in the character set and not valid?

Read the article

Why does printf report an error on all but three (ASCII-range) Unicode Codepoints, yet is fine with all others?

- by fred.bear

The 'printf' I refer to is the standard-issue "program" (not the built-in): /usr/bin/printf I was testing printf out as a viable method of convert a Unicode Codepoint Hex-literal into its Unicoder character representation, I was looking good, and seemed flawless..(btw. the built-in printf can't do this at all (I think)... I then thought to test it at the lower extreme end of the code-spectrum, and it failed with an avalanche of errors.. All in the ASCII range (= 7 bits) The strangest thing was that 3 value printed normally; they are: $ \u0024 @ \u0040 ` \u0060 I'd like to know what is going on here. The ASCII character-set is most definitely part of the Unicode Code-point sequence.... I am puzzled, and still without a good way to bash script this particular converion.. Suggestions are welcome. To be entertained by that same avalanche of errors, paste the following code into a terminal... # Here is one of the error messages # /usr/bin/printf: invalid universal character name \u0041 # ...for them all, run the following script ( for nib1 in {0..9} {A..F}; do for nib0 in {0..9} {A..F}; do [[ $nib1 < A ]] && nl="\n" || nl=" " $(type -P printf) "\u00$nib1$nib0$nl" done done echo )

Read the article

notepad sql Unicode and Non Unicode

- by RBrattas

Hi, I have a Microsoft Notepad flate file with data and Vertical Bar as column delimiter. I get following message: cannot convert between unicode and non-unicode string data types It seems it is my nvarchar(max) that creates my problem. I changed to varchar(max); but still the same problem. And in the SQL Server 2005 import and export wizard the flate file source advanced tab the OutputColumnWith is 50. Will that say my flate file column is max 50? I hope not because my column is more then 50... Thank you, Rune

Read the article

notepad sql Unicode and Non Unicode

- by RBrattas

Hi, I have a Microsoft Notepad flate file with data and Vertical Bar as column delimiter. I get following message: cannot convert between unicode and non-unicode string data types It seems it is my nvarchar(max) that creates my problem. I changed to varchar(max); but still the same problem. How do I insert my flate file into my SQL Server 2005? And in the SQL Server 2005 import and export wizard the flate file source advanced tab the OutputColumnWith is 50. Will that say my flate file column is max 50? I hope not because my column is more then 50... Thank you, Rune

Read the article

PHP Explode with an Unicode character as separator

- by Young Roger

XPDFs pdftotext converts pdf to text and outputs it at command line level. If needed it inserts PageBreaks between the pages as specified in TextOutputDev.cc: eopLen = uMap->mapUnicode(0x0c, eop, sizeof(eop)); This Unicode symbol is encoding independent, -enc ASCII7 wouldn't change it. I'm currently willing to use PHP for converting and splitting the PDF file into several TXT pages for database storage. However, the following function does work, but takes twice as long as a conversion of the whole book in one time. for($i = 1; $i <= $pages[0]; $i++) $page[$i] = shell_exec('/usr/bin/pdftotext sample.pdf -f '.$i.' -l '.$i.' -'); How am I supposed to explode(0x0c, $wholePDF) with an Unicode character as separator? Currently, page[$i] doesn't seem to retrieve those weird Unicode PageBreak characters from the shell_exec(). I tried several headers for encoding (UTF-8 especially) but it didn't work out so far.

Read the article

Delphi Unicode String Type Stored Directly at its Address

- by Andreas Rejbrand

I want a string type that is Unicode and that stores the string directly at the adress of the variable, as is the case of the (Ansi-only) ShortString type. I mean, if I declare a S: ShortString and let S := 'My String', then, at @S, I will find the length of the string (as one byte, so the string cannot contain more than 255 characters) followed by the ANSI-encoded string itself. What I would like is a Unicode variant of this. That is, I want a string type such that, at @S, I will find a unsigned 32-bit integer containing the length of the string in bytes (or in characters, which is half the number of bytes) followed by the Unicode representation of the string. I have tried WideString, UnicodeString, and RawByteString, but they all appear only to store an adress at @S, and the actual string somewhere else (I guess this has do do with reference counting and such). I suspect that there is no built-in type to use, and that I have to come up with my own way of storing text the way I want (which actually is fun). Am I right?

Read the article

CDOSYS and Unicode in the from field - vbScript.

- by Simmo

I've got the code below, and I'm trying to set the from field to allow unicode. Currently in my email client I get "??". The subject line and any content shows the unicode correctly. And looking at the MSDN the property should be "urn:schemas:httpmail:from". Anyone solved this issue? Thanks M Dim AC_EMAIL : AC_EMAIL = "[email protected]" Dim AC_EMAIL_FROM : AC_EMAIL_FROM = "?? <[email protected]>" Dim strSubject : strSubject = """??"" testing testing" set oMessage = WScript.CreateObject("CDO.Message") With oMessage .BodyPart.charset = "utf-8" 'unicode-1-1-utf-8 .Fields("urn:schemas:httpmail:from") = AC_EMAIL_FROM .Fields("urn:schemas:httpmail:to") = AC_EMAIL .Fields("urn:schemas:httpmail:subject") = strSubject .Fields.Update .Send End With Set oMessage = Nothing

Read the article

Wordpress is ignoring Unicode Chars in URL

- by Ankur Gupta

Hi, I am using wordpress with this type of permalink: /%year%/%monthnum%/%postname%/ if I use this type of url: example.com/2010/03/????? it treats this url like this example.com/2010/03/ (By ignoring unicode chars) and displays March 2010 archive list. if I use english url: example.com/2010/03/technology then it works perfectly. This problem occurs even on tags page: for example example.com/tag/??????? is treated like example.com/tag/ and displays 404 page. Why wordpress is ignoring unicode chars? If I use default querystring structure then it works perfectly even with unicode characters. Server Info: IIS7 Win2008 Server (Url rewriting enabled) Wordpress 2.9.2

Read the article

Double-Escaped Unicode Javascript Issue

- by Jeffrey Winter

I am having a problem displaying a Javascript string with embedded Unicode character escape sequences (\uXXXX) where the initial "\" character is itself escaped as "\" What do I need to do to transform the string so that it properly evaluates the escape sequences and produces output with the correct Unicode character? For example, I am dealing with input such as: "this is a \u201ctest\u201d"; attempting to decode the "\" using a regex expression, e.g.: var out = text.replace('/\/g','\'); results in the output text: "this is a \u201ctest\u201d"; that is, the Unicode escape sequences are displayed as actual escape sequences, not the double quote characters I would like.

Read the article

SQLAlchemy automatically converts str to unicode on commit

- by Victor Stanciu

Hello, When inserting an object into a database with SQLAlchemy, all it's properties that correspond to String() columns are automatically transformed from <type 'str'> to <type 'unicode'>. Is there a way to prevent this behavior? Here is the code: from sqlalchemy import create_engine, Table, Column, Integer, String, MetaData from sqlalchemy.orm import mapper, sessionmaker engine = create_engine('sqlite:///:memory:', echo=False) metadata = MetaData() table = Table('projects', metadata, Column('id', Integer, primary_key=True), Column('name', String(50)) ) class Project(object): def __init__(self, name): self.name = name mapper(Project, table) metadata.create_all(engine) session = sessionmaker(bind=engine)() project = Project("Lorem ipsum") print(type(project.name)) session.add(project) session.commit() print(type(project.name)) And here is the output: <type 'str'> <type 'unicode'> I know I should probably just work with unicode, but this would involve digging through some third-party code and I don't have the Python skills for that yet :)

Read the article

C# Button Text Unicode characters.

- by Fossaw

C# doesn't want to put Unicode characters on buttons. If I put \u2129 in the Text attribute of the button, the button displays the \u2129, not the Unicode character, (example - I chose 2129 because I could see it in the font currently active on the machine). I saw this question before, link text, but the question isn't really answered, just got around. I am working on applications which are going all over the world, and don't want to install all the fonts, more then "don't want", there are that many that I doubt the machine I am working on has sufficient disk space. Our overseas sales agents supply the Unicode character "numbers". Is there another way forward with this? As an aside, (curiosity), why does it not work?

Search Results

Search found 1649 results on 66 pages for 'unicode normalization'.

Page 3/66 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Chris

- by ephemeralis

- by bukzor

- by alex

- by SCNCN2010

- by DeadMG

- by jax

- by Nicolas Raoul

- by open_sourse

- by Chase

- by Chocho

- by fireeyedboy

- by srkiNZ84

- by Derek Dahmer

- by Steve Cotner

- by fred.bear

- by RBrattas

- by RBrattas

- by Young Roger

- by Andreas Rejbrand

- by Simmo

- by Ankur Gupta

- by Jeffrey Winter

- by Victor Stanciu

- by Fossaw

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >