unicode normalization - Page 13

How to ensure that no non-ascii unicode characters are entered ?

- by Jacques René Mesrine

Given a java.lang.String instance, I want to verify that it doesn't contain any unicode characters that are not ASCII alphanumerics. e.g. The string should be limited to [A-Za-z0-9.]. What I'm doing now is something very inefficient: import org.apache.commons.lang.CharUtils; String s = ...; char[] ch = s.toCharArray(); for( int i=0; i<ch.length; i++) { if( ! CharUtils.isAsciiAlphanumeric( ch[ i ] ) throw new InvalidInput( ch[i] + " is invalid" ); } Is there a better way to solve this ?

Read the article

What's the fastest way to strip and replace a document of high unicode characters using Python?

- by Rhubarb

I am looking to replace from a large document all high unicode characters, such as accented Es, left and right quotes, etc., with "normal" counterparts in the low range, such as a regular 'E', and straight quotes. I need to perform this on a very large document rather often. I see an example of this in what I think might be perl here: http://www.designmeme.com/mtplugins/lowdown.txt Is there a fast way of doing this in Python without using s.replace(...).replace(...).replace(...)...? I've tried this on just a few characters to replace and the document stripping became really slow.

Read the article

what is meant by normalization in huge pointers

- by wrapperm

Hi, I have a lot of confusion on understanding the difference between a "far" pointer and "huge" pointer, searched for it all over in google for a solution, couldnot find one. Can any one explain me the difference between the two. Also, what is the exact normalization concept related to huge pointers. Please donot give me the following or any similar answers: "The only difference between a far pointer and a huge pointer is that a huge pointer is normalized by the compiler. A normalized pointer is one that has as much of the address as possible in the segment, meaning that the offset is never larger than 15. A huge pointer is normalized only when pointer arithmetic is performed on it. It is not normalized when an assignment is made. You can cause it to be normalized without changing the value by incrementing and then decrementing it. The offset must be less than 16 because the segment can represent any value greater than or equal to 16 (e.g. Absolute address 0x17 in a normalized form would be 0001:0001. While a far pointer could address the absolute address 0x17 with 0000:0017, this is not a valid huge (normalized) pointer because the offset is greater than 0000F.). Huge pointers can also be incremented and decremented using arithmetic operators, but since they are normalized they will not wrap like far pointers." Here the normalization concept is not very well explained, or may be I'm unable to understand it very well. Can anyone try explaining this concept from a beginners point of view. Thanks, Rahamath

Read the article

Where can I find an array of the unassigned Unicode code points for a particular block?

- by gitparade

At the moment, I'm writing these arrays by hand. For example, the Miscellaneous Mathematical Symbols-A block has an entry in hash like this: my %symbols = ( ... miscellaneous_mathematical_symbols_a => [(0x27C0..0x27CA), 0x27CC, (0x27D0..0x27EF)], ... ) The simpler, 'continuous' array miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF] doesn't work because Unicode blocks have holes in them. For example, there's nothing at 0x27CB. Take a look at the code chart [PDF]. Writing these arrays by hand is tedious, error-prone and a bit fun. And I get the feeling that someone has already tackled this in Perl!

Read the article

Where can I find an array of the (un)assigned Unicode code points for a particular block?

- by gitparade

At the moment, I'm writing these arrays by hand. For example, the Miscellaneous Mathematical Symbols-A block has an entry in hash like this: my %symbols = ( ... miscellaneous_mathematical_symbols_a => [(0x27C0..0x27CA), 0x27CC, (0x27D0..0x27EF)], ... ) The simpler, 'continuous' array miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF] doesn't work because Unicode blocks have holes in them. For example, there's nothing at 0x27CB. Take a look at the code chart [PDF]. Writing these arrays by hand is tedious, error-prone and a bit fun. And I get the feeling that someone has already tackled this in Perl!

Read the article

How to replace unicode characters by ascii characters in Python (perl script given)?

- by Frank

I am trying to learn python and couldn't figure out how to translate the following perl script to python: #!/usr/bin/perl -w use open qw(:std :utf8); while(<>) { s/\x{00E4}/ae/; s/\x{00F6}/oe/; s/\x{00FC}/ue/; print; } The script just changes unicode umlauts to alternative ascii output. (So the complete output is in ascii.) I would be grateful for any hints. Thanks!

Read the article

Java: How to get Unicode name of a character (or its type category)?

- by java.is.for.desktop

Hello, everyone! The Character class in Java defines methods which check a given char argument for equality with certain Unicode chars or for belonging to some type category. These chars and type categories are named. As stated in given javadoc, examples for named chars are HORIZONTAL TABULATION, FORM FEED, ...; example for named type categories are SPACE_SEPARATOR, PARAGRAPH_SEPARATOR, ... However, being byte or int values instead of enums, the name of these types are "hidden" at runtime. So, is there a possibility to get characters' and/or type categories' names at runtime?

Read the article

What is the universal way to use file I/O API with unicode filenames?

- by dma_k

In Windows there is a common problem: the filenames should be converted to local codepage, before they are passed to open(). Of course, there is a possibility to use Win32::API for that, but I don't want my script to be platform-dependent. At the moment I have to write something like: open IN, "<", encode("cp1251", $filename) or die $!; but is there any library, that hides these details? I think the local codepage can be automatically detected, so I just want to pass unicode filename and forget about the details. Why is it still not in the box?

Read the article

Where can I find an array of the Unicode code points for a particular block?

- by gitparade

At the moment, I'm writing these arrays by hand. For example, the Miscellaneous Mathematical Symbols-A block has an entry in hash like this: my %symbols = ( ... miscellaneous_mathematical_symbols_a => [(0x27C0..0x27CA), 0x27CC, (0x27D0..0x27EF)], ... ) The simpler, 'continuous' array miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF] doesn't work because Unicode blocks have holes in them. For example, there's nothing at 0x27CB. Take a look at the code chart [PDF]. Writing these arrays by hand is tedious, error-prone and a bit fun. And I get the feeling that someone has already tackled this in Perl!

Read the article

[perl] where can i find an array of the Unicode code points for a particular block?

- by gitparade

At the moment, I'm writing these arrays by hand. For example, the Miscellaneous Mathematical Symbols-A block has an entry in hash like this: my %symbols = ( ... miscellaneous_mathematical_symbols_a => [(0x27C0..0x27CA), 0x27CC, (0x27D0..0x27EF)], ... ) The simpler, 'continuous' array miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF] doesn't work because Unicode blocks have holes in them. For example, there's nothing at 0x27CB. Take a look at the code chart [PDF]. Writing these arrays by hand is tedious, error-prone and a bit fun. And I get the feeling that someone has already tackled this in Perl!

Read the article

What new Unicode functions are there in C++0x?

- by luiscubal

It has been mentioned in several sources that C++0x will include better language-level support for Unicode(including types and literals). If the language is going to add these new features, it's only natural to assume that the standard library will as well. However, I am currently unable to find any references to the new standard library. I expected to find out the answer for these answers: Does the new library provide standard methods to convert UTF-8 to UTF-16, etc.? Does the new library allowing writing UTF-8 to files, to the console (or from files, from the console). If so, can we use cout or will we need something else? Does the new library include "basic" functionality such as: discovering the byte count and length of a UTF-8 string, converting to upper-case/lower-case(does this consider the influence of locales?) Finally, are any of these functions are available in any popular compilers such as GCC or Visual Studio? I have tried to look for information, but I can't seem to find anything? I am actually starting to think that maybe these things aren't even decided yet(I am aware that C++0x is a work in progress).

Read the article

How can I use io.StringIO() with the csv module?

- by Tim Pietzcker

I tried to backport a Python 3 program to 2.7, and I'm stuck with a strange problem: >>> import io >>> import csv >>> output = io.StringIO() >>> output.write("Hello!") # Fail: io.StringIO expects Unicode Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unicode argument expected, got 'str' >>> output.write(u"Hello!") # This works as expected. 6L >>> writer = csv.writer(output) # Now let's try this with the csv module: >>> csvdata = [u"Hello", u"Goodbye"] # Look ma, all Unicode! (?) >>> writer.writerow(csvdata) # Sadly, no. Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unicode argument expected, got 'str' According to the docs, io.StringIO() returns an in-memory stream for Unicode text. It works correctly when I try and feed it a Unicode string manually. Why does it fail in conjunction with the csv module, even if all the strings being written are Unicode strings? Where does the str come from that causes the Exception? (I do know that I can use StringIO.StringIO() instead, but I'm wondering what's wrong with io.StringIO() in this scenario)

Read the article

On the fly volume normalization (waveform capping) for VLC or OS X

- by Grammar Nazi

I'm looking to impose a hard limit on a movie. In particular when I have earbuds in and a reasonable volume set, loud or sudden sounds (gun shot, dramatic sound effects, gun fire, etc) are loud enough to hurt. Lowering the volume makes speech and other sounds hard to hear. Is there a third party on-the-fly solution for this, or a plug-in for VLC that I can use?

Read the article

Unicode paragraph end/line break breaking space / non breaking space aware text editor

- by martinr

I want one of those to write my blog articles with. I'm tired of manually converting breaks from rough notes to either paragraphs or line breaks for release as HTML, and tired of converting spaces to breaking or non-breaking ones. There are standard Unicode code points for the difference - what editor lets me use almost plain ASCII text but with builtin support and understanding for Unicode paragraph and non-breaking space characters? And ideally will let me save straight to either plain text UTF8 or to a file of plain HTML paragraphs?

Read the article

database----database normalization

- by runeveryday

someone told me the following table isn't fit for the second database normalization. but i don't know why? i am a newbie of database design, i have read some tutorials of the 3NF. but to the 2NF and 3NF, i can't understand them well. expect someone can explain it for me. thank you, +------------+-----------+-------------------+ pk pk row +------------+-----------+-------------------+ A B C +------------+-----------+-------------------+ A D C +------------+-----------+-------------------+ A E C +------------+-----------+-------------------+

Read the article

dreamweaver disable "language for non unicode programs" detection

- by YuriKolovsky

Dreamweaver CS4 auto-detects the language for non unicode programs in windows, in my case it is russian, and conviniently sets the default encoding to be western european instead of the much preferred utf-8, it also changes several bits of text in DW into russian. how do i disable this detection and maintain Dreamweaver in full english? (without having to change the language for non-unicode programs in windows)

Read the article

Automatic adaptive gain normalization

- by Eduardo

How may I normalize a (voice) audio mp3 or aac file with no loss, having the gain rised as much as possible with little distortion, so in a long conversation people that speak softer can have more gain for their voice and people that speak louder can have less gain?

Read the article

How to show telugu font in emulator correctly

- by raman

i am developing an application in which i get the unicode using json and it showing correctly the unicode when i am trying to see in the debug mode,Now the problem is that how can i show in the emulator,i have using the UTF-8 for rendering the unicode bt it didn't show ? And when i am trying to show in using the setTypeface it showing the Telugu font in a simple program even but not correctly.i am using the Pothana2000.ttf to convert the telugu unicode to Telugu language. Suggestions welcome.Need reply urgently.

Read the article

Finding Those Pesky Unicode Characters in Visual Studio

- by fallen888

Sometimes I’m handed HTML that I need to wire up and I find these characters. Usually there are only a couple on the page and, while annoying to find, it’s not a big deal. Recently I found dozens and dozens of these guys on a page and wasn’t very happy at the prospect of having to manually search them all out and remove/replace them. That is, until I did some research and found this very helpful article by Aaron Jensen - Finding Non-ASCII Characters with Visual Studio. Aaron’s wonderful solution: Try searching your code with the following regular expression: [^\x00-\x7f] Open any of Visual Studio’s find windows and enter the regular expression above into the “Find what:” text box. Click the “Find Options” plus sign to expand the list of options. Check the last box “Use:” and choose “Regular expressions” from the drop down menu. Easy and efficient. Thanks, Aaron!

Read the article

Shakespeare and storing Unicode characters

- by John Paul Cook

This post is about the political issues involved with using multiple languages in a global organization and how to troubleshoot the technical details. The CHAR and VARCHAR data types are NOT suitable for global data. Some people still cling to CHAR and VARCHAR justifying their use by truthfully saying that they only take up half the space of NCHAR and NVARCHAR data types. But you’ll never be able to store Chinese, Korean, Greek, Japanese, Arabic, or many other languages unless you use NCHAR and NVARCHAR...(read more)

Read the article

What is adding frog characters to my URLs?

- by Jacob Hume

While browsing the "Crawl Errors" section of Google Webmaster Tools, I discovered a set of very strange 500 errors in reference to my site: I was able to track down what these characters are, and apparently they are the first two characters in the Unicode Private Use Area. My font just happened to map them to a frog wearing a tiny crown, and a symbol that resembles the numeral 7. These symbols only appear on the addresses of non-HTML files; office documents, PDFs, etc. - but they do not just appear in the file name. Where are these symbols coming from, and is there any way I can get rid of them so Google can properly crawl my site? Some background information: Using Web Server running WS2K3 with IIS6 and PHP 5.3.8 Site encoding is UTF-8 These symbols don't appear on the page, or in the source

Read the article

Django model: Reference foreign key table in unicode function for admin

- by pa

Example models: class Parent(models.Model): name = models.CharField() def __unicode__(self): return self.name class Child(models.Model): parent = models.ForeignKey(Parent) def __unicode__(self): return self.parent.name # Would reference name above I'm wanting the Child.unicode to refer to Parent.name, mostly for the admin section so I don't end up with "Child object" or similar, I'd prefer to display it more like "Child of ". Is this possible? Most of what I've tried hasn't worked unfortunately.

Read the article

parse XML file that contains unicode characters in iphone

- by Jim

Hi, I am trying to parse one XML file that contains some unicode characters.I tried to parse the file using NSXMLParser but i am unable to parse XML.Parser stops when it encounters any unicode characters. Is there any other good solution to parse XML file with unicode letters? Please suggest. Thanks, Jim.

Read the article

Separating null byte separated UNICODE C string.

- by Ramblingwood

First off, this is NOT a duplicate of: http://stackoverflow.com/questions/1911053/turn-a-c-string-with-null-bytes-into-a-char-array , because the given answer doesn't work when the char *'s are Unicode. I think the problem is that because I am trying to use Unicode and thus wchar_t instead of char, the length of each character is different and thus, this doesn't work (it does in non-unicode): char *Buffer; // your null-separated strings char *Current; // Pointer to the current string // [...] for (Current = Buffer; *Current; Current += strlen(Current) + 1) printf("GetOpenFileName returned: %s\n", Current); Does anyone have a similar solution that works on Unicode strings? I have been banging my head on the this for over 4 hours now. C doesn't agree with me.

Read the article

ActiveRecord field normalization

- by Bill

I feel bad asking this question, as I thought I knew enough about Activerecord to answer this myslef. But such is the way of having SO available ... I'm trying to remove the commas from a field in a model of mine, I want the user to be able to type a number , ie 10,000 and that number be stored in the database as 10000. I was hoping that I could do some model-side normalization to remove the comma. I don't want to depend on the view or controller to properly format my data. I tried ; before_validation :normalize def normalize self['thenumber'] = self['thenumber'].to_s.gsub(',','') end no worky :(

Search Results

Search found 1649 results on 66 pages for 'unicode normalization'.

Page 13/66 | < Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >

- by Jacques René Mesrine

- by Rhubarb

- by wrapperm

- by gitparade

- by gitparade

- by Frank

- by java.is.for.desktop

- by dma_k

- by gitparade

- by gitparade

- by luiscubal

- by Tim Pietzcker

- by Grammar Nazi

- by martinr

- by runeveryday

- by YuriKolovsky

- by Eduardo

- by raman

- by fallen888

- by John Paul Cook

- by Jacob Hume

- by pa

- by Jim

- by Ramblingwood

- by Bill

< Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >