Search Results

Search found 1490 results on 60 pages for 'rxvt unicode'.

Page 26/60 | < Previous Page | 22 23 24 25 26 27 28 29 30 31 32 33  | Next Page >

  • Forcing a mixed ISO-8859-1 and UTF-8 multi-line string into UTF-8 in Perl

    - by knorv
    Consider the following problem: A multi-line string $junk contains some lines which are encoded in UTF-8 and some in ISO-8859-1. I don't know a priori which lines are in which encoding, so heuristics will be needed. I want to turn $junk into pure UTF-8 with proper re-encoding of the ISO-8859-1 lines. Also, in the event of errors in the processing I want to provide a "best effort result" rather than throwing an error. My current attempt looks like this: $junk = &force_utf8($junk); sub force_utf8 { my $input = shift; my $output = ''; foreach my $line (split(/\n/, $input)) { if (utf8::valid($line)) { utf8::decode($line); } $output .= "$line\n"; } return $output; } While this appears to work I'm certain this is not the optimal solution. How would you improve the force_utf8(...) sub?

    Read the article

  • Working with Japanese filenames in PHP 5.3 and Windows Vista?

    - by Jon
    I'm currently trying to write a simple script that looks in a folder, and returns a list of all the file names in an RSS feed. However I've hit a major wall... Whenever I try to read filenames with Japanese characters in them, it shows them as ?'s. I've tried the solutions mentioned here: http://stackoverflow.com/questions/482342/php-readdir-problem-with-japanese-language-file-name - however they do not work for some reason, even with: header('Content-Type: text/html; charset=UTF-8'); setlocale(LC_ALL, 'en_US.UTF8'); mb_internal_encoding("UTF-8"); At the top (Exporting as plain text until I can sort this out). What can I do? I need this to work and I don't have much time.

    Read the article

  • If a command line program is unsure of stdout's encoding, what encoding should it output?

    - by mackstann
    I have a command line program written in Python, and when I pipe it through another program on the command line, sys.stdout.encoding is None. This makes sense, I suppose -- the output could be another program, or a file you're redirecting it into, or whatever, and it doesn't know what encoding is desired. But neither do I! This program will be used by many different people (humor me) in different ways. Should I play it safe and output only ascii (replacing non-ascii chars with question marks)? Or should I output UTF-8, since it's so widespread these days?

    Read the article

  • How can I test if an input field contains foreign characters?

    - by zeckdude
    I have an input field in a form. Upon pushing submit, I want to validate to make sure the user entered non-latin characters only, so any foreign language characters, like Chinese among many others. Or at the very least test to make sure it does not contain any latin characters. Could I use a regular expression for this? What would be the best approach for this? I am validating in both javaScript and in PHP. What solutions can I use to check for foreign characters in the input field in both programming languages?

    Read the article

  • Parsing through Arabic / RTL text from left to right

    - by Dan W
    Let's say I have a string in an RTL language such as Arabic with some English chucked in: string s = "Test:?????;?????;?????;a;b" Notice there are semicolons in the string. When I use the Split command like string[] spl = s.Split(';');, then some of the strings are saved in reverse order. This is what happens: ??Test:????? ????? ????? a b The above is out of order compared to the original. Instead, I expect to get this: ?Test: ????? ????? ????? a b I'm prepared to write my own split function. However, the chars in the string also parse in reverse order, so I'm back to square one. I just want to go through each character as it's shown on the screen.

    Read the article

  • What is the difference between _tmain() and main() in C++?

    - by joshcomley
    If I run my C++ application with the following main() method everything is OK: int main(int argc, char *argv[]) { cout << "There are " << argc << " arguments:" << endl; // Loop through each argument and print its number and value for (int i=0; i<argc; i++) cout << i << " " << argv[i] << endl; return 0; } I get what I expect and my arguments are printed out. However, if I use _tmain: int _tmain(int argc, char *argv[]) { cout << "There are " << argc << " arguments:" << endl; // Loop through each argument and print its number and value for (int i=0; i<argc; i++) cout << i << " " << argv[i] << endl; return 0; } It just displays the first character of each argument. What is the difference causing this?

    Read the article

  • PHP: Convert web-page to utf8

    - by Paul Tarjan
    I would like to only work with UTF8. The problem is I don't know the charset of every webpage. How can I detect it and convert to UTF8? <?php $url = "http://vkontakte.ru"; $ch = curl_init($url); $options = array( CURLOPT_RETURNTRANSFER => true, ); curl_setopt_array($ch, $options); $data = curl_exec($ch); // $data = magic($data); print $data; See this at: http://paulisageek.com/tmp/curl-utf8 What is magic()?

    Read the article

  • convert &uuml; to u

    - by Remus Rigo
    hi all I'm using a database that contains contacts (fields like name, address, ...). If i'm using in my database a city that contains special chars (like ü) or html codes (like &uuml;), then how can i convert them to u, so when i search for a city that contains that a special char should be shown in the result...

    Read the article

  • F5 Networks iRule/Tcl - Escaping UNICODE 6-character escape sequences so they are processed as and r

    - by openid.malcolmgin.com
    We are trying to get an F5 BIG-IP LTM iRule working properly with SharePoint 2007 in an SSL termination role. This architecture offloads all of the SSL processing to the F5 and the F5 forwards interactive requests/responses to the SharePoint front end servers via HTTP only (over a secure network). For the purposes of this discussion, iRules are parsed by a Tcl interpretation engine on the F5 Networks BIG-IP device. As such, the F5 does two things to traffic passing through it: Redirects any request to port 80 (HTTP) to port 443 (HTTPS) through HTTP 302 redirects and URL rewriting. Rewrites any response to the browser to selectively rewrite URLs embedded within the HTML so that they go to port 443 (HTTPS). This prevents the 302 redirects from breaking DHTML generated by SharePoint. We've got part 1 working fine. The main problem with part 2 is that in the response rewrite because of XML namespaces and other similar issues, not ALL matches for "http:" can be changed to "https:". Some have to remain "http:". Additionally, some of the "http:" URLs are difficult in that they live in SharePoint-generated JavaScript and their slashes (i.e. "/") are actually represented in the HTML by the UNICODE 6-character string, "\u002f". For example, in the case of these tricky ones, the literal string in the outgoing HTML is: http:\u002f\u002fservername.company.com\u002f And should be changed to: https:\u002f\u002fservername.company.com\u002f Currently we can't even figure out how to get a match in a search/replace expression on these UNICODE sequence string literals. It seems that no matter how we slice it, the Tcl interpreter is interpreting the "\u002f" string into the "/" translation before it does anything else. We've tried various combinations of Tcl escaping methods we know about (mainly double-quotes and using an extra "\" to escape the "\" in the UNICODE string) but are looking for more methods, preferably ones that work. Does anyone have any ideas or any pointers to where we can effectively self-educate about this? Thanks very much in advance.

    Read the article

  • What are ways to prevent files with the Right-to-Left Override Unicode character in their name (a malware spoofing method) from being written or read?

    - by galacticninja
    What are ways to avoid or prevent files with the RLO (Right-to-Left Override) Unicode character in their name (a malware method to spoof filenames) from being written or read in a Windows PC? More info on the RLO unicode character here: http://www.fileformat.info/info/unicode/char/202e/index.htm http://en.wikipedia.org/wiki/Bi-directional_text Info on the RLO unicode character when used by malware: http://www.ipa.jp/security/english/virus/press/201110/E_PR201110.html Mirror link: http://webcache.googleusercontent.com/search?q=cache:KasmfOvbVJ8J:www.ipa.jp/security/english/virus/press/201110/E_PR201110.html+&cd=1&hl=en&ct=clnk You can try this RLO character test webpage: http://www.fileformat.info/info/unicode/char/202e/browsertest.htm The RLO character is also already pasted in the 'Input Test' field in that webpage. Try typing there and notice that the characters you're typing are coming out in their reverse orders (right-to-left, instead of left-to-right). In filenames, the RLO character can be specifically positioned in the filename to spoof or masquerade as having a filename or file extension that is different than what it actually has. (Will still be hidden even if 'Hide extensions for known filetypes' is unchecked.) The only info I can find that has info on how to prevent files with the RLO character from being run is from the Information Technology Promotion Agency, Japan website: http://www.ipa.jp/security/english/virus/press/201110/E_PR201110.html (Mirror link). They adviced to use the Local Security Policy settings manager to block files with the RLO character in its name from being run. Can anyone recommend any other good solutions to prevent files with the RLO character in their names from being written or being read in the computer, or a way to alert the user if a file with the RLO character is detected? My OS is Windows 7, but I'll be looking for solutions for Windows XP, Vista and 7, or a solution that will work for all those OSes, to help people using those OSes too.

    Read the article

  • Can I convert an ASCII MD5 hashed password into a Unicode MD5 hashed password?

    - by Jimmy Moo Moo
    Hello, I'm looking for help to convert an ASCII MD5 hashed password into a Unicode MD5 hashed password? For example, I'll use the string "password" . When it's converted to an ascii byte array, I get a base64 encoded hash of X03MO1qnZdYdgyfeuILPmQ== When it's converted into a unicode byte array, I get a base64 encoded hash of sIHb6F4ew//D1OfQInQAzQ== All my passwords are stored in an md5 hash that was applied to an ascii byte array, but I'm trying to migrate my application's user data to a system that stores password in an md5 hash that is applied a unicode byte array. In case it's not clear, with the following C#code: var passwordBytes = Encoding.ASCII.GetBytes("password"); var hashAlgorithm = HashAlgorithm.Create("MD5"); var hashBytes = hashAlgorithm.ComputeHash(passwordBytes); My current system uses this, but the system I'm moving to has a diff first time. It usese Encoding.Unicode.GetBytes. Does anybody know how I can convert my passwords? From X03MO1qnZdYdgyfeuILPmQ== into sIHb6F4ew//D1OfQInQAzQ== I'm guessing the answer is that I can't.. the encoding is being done before the hashing, but I thought I'd inquire the bright minds of stackoverflow and see if anybody has a way.

    Read the article

  • Stream/string/bytearray transformations in Python 3

    - by Craig McQueen
    Python 3 cleans up Python's handling of Unicode strings. I assume as part of this effort, the codecs in Python 3 have become more restrictive, according to the Python 3 documentation compared to the Python 2 documentation. For example, codecs that conceptually convert a bytestream to a different form of bytestream have been removed: base64_codec bz2_codec hex_codec And codecs that conceptually convert Unicode to a different form of Unicode have also been removed (in Python 2 it actually went between Unicode and bytestream, but conceptually it's really Unicode to Unicode I reckon): rot_13 My main question is, what is the "right way" in Python 3 to do what these removed codecs used to do? They're not codecs in the strict sense, but "transformations". But the interface and implementation would be very similar to codecs. I don't care about rot_13, but I'm interested to know what would be the "best way" to implement a transformation of line ending styles (Unix line endings vs Windows line endings) which should really be a Unicode-to-Unicode transformation done before encoding to byte stream, especially when UTF-16 is being used, as discussed this other SO question.

    Read the article

  • Social media and special characters

    - by John Paul Cook
    I’ve previously blogged about using Unicode with T-SQL to put superscripts, subscripts, and special characters into text strings. Unicode is also useful in formatting social media such as Facebook, Twitter, and that dinosaur otherwise known as email. When you can’t set properties of text such as italicizing the subject line of an email message or adding subscripts to a Facebook post, Unicode can make it possible. There are Unicode characters that are intrinsically italicized. Others are intrinsically...(read more)

    Read the article

  • Escaping In Expressions

    The expressions language is a C style syntax, so you may need to escape certain characters, for example: "C:\FolderPath\" + @VariableName Should be "C:\\FolderPath\\" + @VariableName Another use of the escape sequence allows you to specify character codes, like this \xNNNN, where NNNN is the Unicode character code that you want. For example the following expression will produce the same result as the previous example as the Unicode character code 005C equals a back slash character: "C:\x005CFolderPath\x005C" + @VariableName For more information about Unicode characters see http://www.unicode.org/charts/ Literals are also supported within expressions, both string literals using the common escape sequence syntax as well as modifiers which influence the handling of numeric values. See the "Literals (SSIS)":http://msdn2.microsoft.com/en-US/library/ms141001(SQL.90).aspx topic. Using the Unicode escaped character sequence you can make up for the lack of a CHAR function or equivalent.

    Read the article

  • What is the real meaning of the "Select a language [for] non-Unicode programs..." dialog?

    - by Joshua Fox
    What is the real meaning of the "Select a language to match the language version of the non-Unicode programs you want to use" dialog under Control Panel-Regional Settings-Advanced in WinXP and Win2003? According to the dialog text, Windows will use this to display the resource strings such as menus. The treatment of text files is application-specific, so this setting will not affect that. But can I expect any other change in behavior from this setting? Any insights into what is really going wrong?

    Read the article

  • How to get a Unicode-supporting font for Windows 7 command-line?

    - by Tim
    I've pointed the command-line to the right codepage (chcp 65001), but there's a lot of Unicode characters that Consolas and Lucida Console can't show. Specifically, I want the printable IPA characters to show up. It's not important to fix multi-codepoint glyphs, although it would be nice. How can I get such a font and install it for the command-line? Below is an example of some characters that can't be rendered.

    Read the article

  • What is the best strategy for transforming unicode strings into filenames?

    - by David Cowden
    I have a bunch (thousands) of resources in an RDF/XML file. I am writing a certain subset of the resources to files -- one file for each, and I'm using the resource's title property as the file name. However, the titles are every day article, website, and blog post titles, so they contain characters unsafe for a URI (the necessary step for constructing a valid file path). I know of the Jersey UriBuilder but I can't quite get it to work for my needs as I detailed in a different question on SO. Some possibilities I have considered are: Since each resource should also have an associated URL, I could try to use the name of the file on the server. The down side of this is sometimes people don't name their content logically and I think the title of an article better reflects the content that will be in each text file. Construct a white list of valid characters and parse the string myself defining substitutions for unsafe characters. The downside of this is the result could be just as unreadable as the former solution because presumably the content creators went through a similar process when placing the files on their server. Choose a more generic naming scheme, place the title in the text file along with the other attributes, and tell my boss to live with it. So my question here is, what methods work well for dealing with a scenario where you need to construct file names out of strings with potentially unsafe characters? Is there a solution that better fills out my constraints?

    Read the article

  • How to convert Beautiful Soup Unicode into a decimal value?

    - by MikeTheCoder
    I'm trying to Use python's Beautiful Soup Library to grab a bunch of divs from an html file, and from there get the string - which is a money value - that's inside the div. Then remove the dollar sign and convert it to a decimal so that I can use a greater than and less than conditional statement to compare values. I have googled the heck out of it and can't seem to come up with a way to convert this unicode string into a decimal value. I really could use some help here. How do I convert unicode into a decimal value? This was my last attempt: import unicodedata from bs4 import BeautifulSoup soup = BeautifulSoup(open("/Users/sm/Documents/python/htmldemo.html")) for tag in soup.findAll("div",attrs={"itemprop":"price"}) : val = tag.string new_val = val[8:] workable = int(new_val) if workable > 250: print(type(workable)) else: print(type(workable)) Edit: When I print the type of new_val I get : print(type(new_val))

    Read the article

  • How to map code points to unicode characters depending on the font used?

    - by Alex Schröder
    The client prints labels and has been using a set of symbolic (?) fonts to do this. The application uses a single byte database (Oracle with Latin-1). The old application I am replacing was not Unicode aware. It somehow did OK. The replacement application I am writing is supposed to handle the old data. The symbols picked from the charmap application often map to particular Unicode characters, but sometimes they don't. What looks like the Moon using the LAB3 font, for example, is in fact U+2014 (EM DASH). When users paste this character into a Swing text field, the character has the code point 8212. It was "moved" into the Private Use Area (by Windows? Java?). When saving this character to the database, Oracle decides that it cannot be safely encoded and replaces it with the dreaded ¿. Thus, I started shifting the characters by 8000: -= 8000 when saving, += 8000 when displaying the field. Unfortunately I discovered that other characters were not shifted by the same amount. In one particular font, for example, ž has the code point 382, so I shifted it by +/-256 to "fix" it. By now I'm dreading the discovery of more strange offsets and I wonder: Can I get at this mapping using Java? Perhaps the TTF font has a list of the 255 glyphs it encodes and what Unicode characters those correspond to and I can do it "right"? Right now I'm using the following kludge: static String fromDatabase(String str, String fontFamily) { if (str != null && fontFamily != null) { Font font = new Font(fontFamily, Font.PLAIN, 1); boolean changed = false; char[] chars = str.toCharArray(); for (int i = 0; i < chars.length; i++) { if (font.canDisplay(chars[i] + 0xF000)) { // WE8MSWIN1252 + WinXP chars[i] += 0xF000; changed = true; } else if (chars[i] >= 128 && font.canDisplay(chars[i] + 8000)) { // WE8ISO8859P1 + WinXP chars[i] += 8000; changed = true; } else if (font.canDisplay(chars[i] + 256)) { // ž in LAB1 Eastern = 382 chars[i] += 256; changed = true; } } if (changed) str = new String(chars); } return str; } static String toDatabase(String str, String fontFamily) { if (str != null && fontFamily != null) { boolean changed = false; char[] chars = str.toCharArray(); for (int i = 0; i < chars.length; i++) { int chr = chars[i]; if (chars[i] > 0xF000) { // WE8MSWIN1252 + WinXP chars[i] -= 0xF000; changed = true; } else if (chars[i] > 8000) { // WE8ISO8859P1 + WinXP chars[i] = (char) (chars[i] - 8000); changed = true; } else if (chars[i] > 256) { // ž in LAB1 Eastern = 382 chars[i] = (char) (chars[i] - 256); changed = true; } } if (changed) return new String(chars); } return str; }

    Read the article

  • How to parse time stamps with Unicode characters in Java or Perl?

    - by ram
    I'm trying to make my code as generic as possible. I'm trying to parse install time of a product installation. I will have two files in the product, one that has time stamp I need to parse and other file tells the language of the installation. This is how I'm parsing the timestamp public class ts { public static void main (String[] args){ String installTime = "2009/11/26 \u4e0b\u5348 04:40:54"; //This timestamp I got from the first file. Those unicode charecters are some Chinese charecters...AM/PM I guess //Locale = new Locale();//don't set the language yet SimpleDateFormat df = (SimpleDateFormat)DateFormat.getDateTimeInstance(DateFormat.DEFAULT,DateFormat.DEFAULT); Date instTime = null; try { instTime = df.parse(installTime); } catch (ParseException e) { // TODO Auto-generated catch block e.printStackTrace(); } System.out.println(instTime.toString()); } } The output I get is Parsing Failed java.text.ParseException: Unparseable date: "2009/11/26 \u4e0b\u5348 04:40:54" at java.text.DateFormat.parse(Unknown Source) at ts.main(ts.java:39) Exception in thread "main" java.lang.NullPointerException at ts.main(ts.java:45) It throws exception and at the end when I print it, it shows some proper date... wrong though. I would really appreciate if you could clarify me on these doubts How to parse timestamps that have unicode characters if this is not the proper way? If parsing is failed, how could instTime able to hold some date, wrong though? I know its some chinese,Korean time stamps so I set the locale to zh and ko as follows.. even then same error comes again Locale = new Locale("ko"); Locale = new Locale("ja"); Locale = new Locale("zh"); How can I do the same thing in Perl? I can't use Date::Manip package; Is there any other way?

    Read the article

  • How do I create self-relationships in polymorphic inheritance in Elixir and Pylons?

    - by Turukawa
    I am new to programming and am following the example in the Pylons documentation on creating a Wiki. The database I want to link to the wiki was created with Elixir so I rewrote the Wiki database schema and have continued from there. In the wiki there is a requirement for a Navigation table which is inherited by Pages and Sections. A section can have many pages, while a page can only have one section. In addition, each sibling node can be chain-referenced to each other. So: Nav has "section" (OneToMany) and "before" (OneToOne - to reference preceeding node) Page has "section" (ManyToOne - many pages in one section) and inherits "before" Section inherits all from Nav The code I've written looks like this: class Nav(Entity): using_options(inheritance='multi') name = Field(Unicode(30), default=u'Untitled Node') path = Field(Unicode(255), default=u'') section = OneToMany('Page', inverse='section') after = OneToOne('Nav', inverse='before') before = OneToMany('Nav', inverse='after') class Page(Nav): using_options(inheritance='multi') content = Field(UnicodeText, nullable=False) posted = Field(DateTime, default=now()) title = Field(Unicode(255), default=u'Untitled Page') heading = Field(Unicode(255)) tags = ManyToMany('Tag') comments = OneToMany('Comment') section = ManyToOne('Nav', inverse='section') class Section(Nav): using_options(inheritance='multi') Errors received on this: sqlalchemy.exc.OperationalError: (OperationalError) table nav has no column named aftr_id u'INSERT INTO nav (name, path, aftr_id, row_type) VALUES (?, ?, ?, ?)' I've also tried: before = ManyToMany('Nav', inverse='before') on Nav in the hopes this might break the problem, but also not. The original SQLAlchemy code from the tutorial for these declarations is as follows: nav_table = schema.Table('nav', meta.metadata, schema.Column('id', types.Integer(), schema.Sequence('nav_id_seq', optional=True), primary_key=True), schema.Column('name', types.Unicode(255), default=u'Untitled Node'), schema.Column('path', types.Unicode(255), default=u''), schema.Column('section', types.Integer(), schema.ForeignKey('nav.id')), schema.Column('before', types.Integer(), default=None), schema.Column('type', types.String(30), nullable=False) ) page_table = schema.Table('page', meta.metadata, schema.Column('id', types.Integer, schema.ForeignKey('nav.id'), primary_key=True), schema.Column('content', types.Text(), nullable=False), schema.Column('posted', types.DateTime(), default=now), schema.Column('title', types.Unicode(255), default=u'Untitled Page'), schema.Column('heading', types.Unicode(255)), ) section_table = sa.Table('section', meta.metadata, schema.Column('id', types.Integer, schema.ForeignKey('nav.id'), primary_key=True), ) orm.mapper(Nav, nav_table, polymorphic_on=nav_table.c.type, polymorphic_identity='nav') orm.mapper(Section, section_table, inherits=Nav, polymorphic_identity='section') orm.mapper(Page, page_table, inherits=Nav, polymorphic_identity='page', properties={ 'comments':orm.relation(Comment, backref='page', cascade='all'), 'tags':orm.relation(Tag, secondary=pagetag_table) }) Any help is much appreciated.

    Read the article

< Previous Page | 22 23 24 25 26 27 28 29 30 31 32 33  | Next Page >