Search Results

Search found 4929 results on 198 pages for 'character'.

Page 86/198 | < Previous Page | 82 83 84 85 86 87 88 89 90 91 92 93  | Next Page >

  • NSString sizeWithFont: returning inconsistent results? known bug?

    - by Olof Hedman
    I'm trying to create a simple custom UIView wich contain a string drawn with a single font, but where the first character is slightly larger. I thought this would be easily implemented with two UILabel:s placed next to eachother. I use NSString sizeWithFont to measure my string to be able to lay it out correctly. But I noticed that the font baseline in the returned rectangle varies with +/- 1 pixel depending on the font size I set. Here is my code: NSString* ctxt = [text substringToIndex:1]; NSString* ttxt = [text substringFromIndex:1]; CGSize sz = [ctxt sizeWithFont: cfont ]; clbl = [[UILabel alloc] initWithFrame:CGRectMake(0, 0, sz.width, sz.height)]; clbl.text = ctxt; clbl.font = cfont; clbl.backgroundColor = [UIColor clearColor]; [contentView addSubview:clbl]; CGSize sz2 = [ttxt sizeWithFont: tfont]; tlbl = [[UILabel alloc] initWithFrame:CGRectMake(sz.width, (sz.height - sz2.height), sz2.width, sz2.height)]; tlbl.text = ttxt; tlbl.font = tfont; tlbl.backgroundColor = [UIColor clearColor]; [contentView addSubview:tlbl]; If I use 12.0 and 14.0 as sizes, it works fine. But if I instead use 13.0 and 15.0, then the first character is 1 pixel too high. Is this a known problem? Any suggestions how to work around it? Creating a UIWebView with a CSS and HTML page seems way overkill for this. and more work to handle dynamic strings. Is that what I'm expected to do?

    Read the article

  • Regex: Use start of line/end of line signs (^ or $) in different context

    - by fgysin
    While doing some small regex task I came upon this problem. I have a string that is a list of tags that looks e.g like this: foo,bar,qux,garp,wobble,thud What I needed to do was to check if a certain tag, e.g. 'garp' was in this list. (What it finally matches is not really important, just if there is a match or not.) My first and a bit stupid try at this was to use the following regex: [^,]garp[,$] My idea was that before 'garp' there should either be the start of the line/string or a comma, after 'garp' there should be either a comma or the end of the line/string. Now, it is instantly obvious that this regex is wrong: Both ^ and $ change their behaviour in the context of the character class [ ]. What I finally came up with is the following: ^garp$|^garp,|,garp,|,garp$ This regex just handles the 4 cases one by one. (Tag at beginning of list, in the center, at the end, or as the only element of the list.) The last regex is somehow a bit ugly in my eyes and just for funs sake I'd like to make it a bit more elegant. Is there a way how the start of line/end of line characters (^ and $) can be used in the context of character classes?

    Read the article

  • RFC regarding WAM

    - by Noctis Skytower
    Request For Comment regarding Whitespace's Assembly Mnemonics What follows in a first generation attempt at creating mnemonics for a whitespace assembly language. STACK ===== push number copy copy number swap away away number MATH ==== add sub mul div mod HEAP ==== set get FLOW ==== part label call label goto label zero label less label back exit I/O === ochr oint ichr iint In the interest of making improvements to this small and simple instruction set, this is a second attempt. hold N Push the number onto the stack copy Duplicate the top item on the stack copy N Copy the nth item on the stack (given by the argument) onto the top of the stack swap Swap the top two items on the stack drop Discard the top item on the stack drop N Slide n items off the stack, keeping the top item add Addition sub Subtraction mul Multiplication div Integer Division mod Modulo save Store load Retrieve L: Mark a location in the program call L Call a subroutine goto L Jump unconditionally to a label if=0 L Jump to a label if the top of the stack is zero if<0 L Jump to a label if the top of the stack is negative return End a subroutine and transfer control back to the caller exit End the program print chr Output the character at the top of the stack print int Output the number at the top of the stack input chr Read a character and place it in the location given by the top of the stack input int Read a number and place it in the location given by the top of the stack What do you think of the following revised list for Whitespace's assembly instructions? I'm still thinking outside of the box somewhat and trying to come up with a better mnemonic set than last time. When the previous interpreter was written, it was completed over two contiguous, rushed evenings. This rewrite deserves significantly more time now that it is the summer. Of course, the next version of Whitespace (0.4) may have its instructions revised even more, but this is just a redesign of what originally was done in a very short amount of time. Hopefully, the instructions make more sense once someone new to programmings thinks about them.

    Read the article

  • C# Regex - Replace multiple characters at once without overwriting?

    - by Everaldo Aguiar
    Hello guys, I'm implementing a c# program that should automatize a Mono-alphabetic substitution cipher. The functionality i'm working on at the moment is the simplest one: The user will provide a plain text and a cipher alphabet, for example: Plain text(input): THIS IS A TEST Cipher alphabet: A - Y, H - Z, I - K, S - L, E - J, T - Q Cipher Text(output): QZKL KL QJLQ I thought of using regular expressions since I've been programming in perl for a while, but I'm encountering some problems on c#. First I would like to know if someone would have a suggestion for a regular expression that would replace all occurrence of each letter by its corresponding cipher letter (provided by user) at once and without overwriting anything. Example: In this case, user provides plaintext "TEST", and on his cipher alphabet, he wishes to have all his T's replaced with E's, E's replaced with Y and S replaced with J. My first thought was to substitute each occurrence of a letter with an individual character and then replace that character by the cipherletter corresponding to the plaintext letter provided. Using the same example word "TEST", the steps taken by the program to provide an answer would be: 1 - replace T's with (lets say) @ 2 - replace E's with # 3 - replace S's with & 4 - Replace @ with E, # with Y, & with j 5 - Output = EYJE This solution doesn't seem to work for large texts. I would like to know if anyone can think of a single regular expression that would allow me to replace each letter in a given text by its corresponding letter in a 26-letter cipher alphabet without the need of splitting the task in an intermediate step as I mentioned. If it helps visualize the process, this is a print screen of my GUI for the program: http://img43.imageshack.us/img43/2118/11618743.jpg

    Read the article

  • doublechecking: no db-wide 'unicode switch' for sql server in the foreseeable future, i.e. like Orac

    - by user72150
    Hi all, I believe I know the answer to this question, but wanted to confirm: Question Does Sql server (or will it in the foreseeable future), offer a database-wide "unicode switch" which says "store all characters in unicode (UTF-16, UCS-2, etc)", i.e. like Oracle. The Context Our application has provided "CJK" (Chinese-Japanese-Korean) support for years--using Oracle as the db store. Recently folks have been asking for the same support in sql server. We store our db schema definition in xml and generate the vendor-specific definitions (oracle, sql server) using vendor-specific xsl. We can make the change easily. The problem is for upgrades. Generated scripts would need to change the column types for 100+ columns from varchar to nvarchar, varchar(max) to nvarchar(max), etc. These changes require dropping and recreating indexes and foreign keys if the any indexes/fk's exist on the column. Non-trivial. Risky. DB-wide character encodings for us would eliminate programming changes. (I.e. we would not to change the column types from varchar to nvarchar; sql server would correctly store unicode data in varchar columns). I had thought that eventually sql server would "see the light" and allow storing unicode in varchar/clob columns. Evidently not yet. Recap So just to triple check: does mssql offer a database-wide switch for character encoding? Will it in SQL2008R3? or 2010? thanks, bill

    Read the article

  • UTF-8 BOM signature in PHP files

    - by skidding
    I was writing some commented PHP classes and I stumbled upon a problem. My name (for the @author tag) ends up with a ? (which is a UTF-8 character, ...and a strange name, I know). Even though I save the file as UTF-8, some friends reported that they see that character totally messed up (È™). This problem goes away by adding the BOM signature. But that thing troubles me a bit, since I don't know that much about it, except from what I saw on Wikipedia and on some other similar questions here on SO. I know that it adds some things at the beginning of the file, and from what I understood it's not that bad, but I'm concerned because the only problematic scenarios I read about involved PHP files. And since I'm writing PHP classes to share them, being 100% compatible is more important than having my name in the comments. But I'm trying to understand the implications, should I use it without worrying? or are there cases when it might cause damage? When? Thanks!

    Read the article

  • Can I write this regex in one step?

    - by Marin Doric
    This is the input string "23x +y-34 x + y+21x - 3y2-3x-y+2". I want to surround every '+' and '-' character with whitespaces but only if they are not allready sourrounded from left or right side. So my input string would look like this "23x + y - 34 x + y + 21x - 3y2 - 3x - y + 2". I wrote this code that does the job: Regex reg1 = new Regex(@"\+(?! )|\-(?! )"); input = reg1.Replace(input, delegate(Match m) { return m.Value + " "; }); Regex reg2 = new Regex(@"(?<! )\+|(?<! )\-"); input = reg2.Replace(input, delegate(Match m) { return " " + m.Value; }); explanation: reg1 // Match '+' followed by any character not ' ' (whitespace) or same thing for '-' reg2 // Same thing only that I match '+' or '-' not preceding by ' '(whitespace) delegate 1 and 2 just insert " " before and after m.Value ( match value ) Question is, is there a way to create just one regex and just one delegate? i.e. do this job in one step? I am a new to regex and I want to learn efficient way.

    Read the article

  • Help with C puzzle

    - by Javier Badia
    I found a site with some complicated C puzzles. Right now I'm dealing with this: The following is a piece of C code, whose intention was to print a minus sign 20 times. But you can notice that, it doesn't work. #include <stdio.h> int main() { int i; int n = 20; for( i = 0; i < n; i-- ) printf("-"); return 0; } Well fixing the above code is straight-forward. To make the problem interesting, you have to fix the above code, by changing exactly one character. There are three known solutions. See if you can get all those three. I cannot figure out how to solve. I know that it can be fixed by changing -- to ++, but I can't figure out what single character to change to make it work.

    Read the article

  • Normalizing (webdav) unicode paths

    - by Evert
    Hi guys, I'm working on a WebDAV implementation for PHP. In order to make it easier for Windows and other operating systems to work together, I need jump through some character encoding hoops. Windows uses ISO-8859-1 in it's HTTP request, while most other clients encode anything beyond ascii as UTF-8. My first approach was to ignore this altogether, but I quickly ran into issues when returning urls. I then figured it's probably best to normalize all urls. Using u¨ as an example. This will get sent over the wire by OS/X as u%CC%88 (this is codepoint U+0308) Windows sents this as: %FC (latin1) But, doing a utf8_encode on %FC, I get : %C3%BC (this is codepoint U+00FC) Should I treat %C3%BC and u%CC%88 as the same thing? If so.. how? Not touching it seems to work OK for windows. It somehow understands that it's a unicode character, but updating the same file throws an error (for no particular reason). I'd be happy to provide more information.

    Read the article

  • postgresql error - ERROR: input is out of range

    - by CaffeineIV
    The function below keeps returning this error message. I thought that maybe the double_precision field type was what was causing this, and I tried to use CAST, but either that's not it, or I didn't do it right... Help? Here's the error: ERROR: input is out of range CONTEXT: PL/pgSQL function "calculate_distance" line 7 at RETURN ********** Error ********** ERROR: input is out of range SQL state: 22003 Context: PL/pgSQL function "calculate_distance" line 7 at RETURN And here's the function: CREATE OR REPLACE FUNCTION calculate_distance(character varying, double precision, double precision, double precision, double precision) RETURNS double precision AS $BODY$ DECLARE earth_radius double precision; BEGIN earth_radius := 3959.0; RETURN earth_radius * acos(sin($2 / 57.2958) * sin($4 / 57.2958) + cos($2/ 57.2958) * cos($4 / 57.2958) * cos(($5 / 57.2958) - ($3 / 57.2958))); END; $BODY$ LANGUAGE 'plpgsql' VOLATILE COST 100; ALTER FUNCTION calculate_distance(character varying, double precision, double precision, double precision, double precision) OWNER TO postgres; //I tried changing (unsuccessfully) that RETURN line to: RETURN CAST( (earth_radius * acos(sin($2 / 57.2958) * sin($4 / 57.2958) + cos($2/ 57.2958) * cos($4 / 57.2958) * cos(($5 / 57.2958) - ($3 / 57.2958))) ) AS text);

    Read the article

  • Processing a log to fix a malformed IP address ?.?.?.x

    - by skymook
    I would like to replace the first character 'x' with the number '7' on every line of a log file using a shell script. Example of the log file: 216.129.119.x [01/Mar/2010:00:25:20 +0100] "GET /etc/.... 74.131.77.x [01/Mar/2010:00:25:37 +0100] "GET /etc/.... 222.168.17.x [01/Mar/2010:00:27:10 +0100] "GET /etc/.... My humble beginnings... #!/bin/bash echo Starting script... cd /Users/me/logs/ gzip -d /Users/me/logs/access.log.gz echo Files unzipped... echo I'm totally lost here to process the log file and save it back to hd... exit 0 Why is the log file IP malformed like this? My web provider (1and1) has decide not to store IP address, so they have replaced the last number with the character 'x'. They told me it was a new requirement by 'law'. I personally think that is bs, but that would take us off topic. I want to process these log files with AWstats, so I need an IP address that is not malformed. I want to replace the x with a 7, like so: 216.129.119.7 [01/Mar/2010:00:25:20 +0100] "GET /etc/.... 74.131.77.7 [01/Mar/2010:00:25:37 +0100] "GET /etc/.... 222.168.17.7 [01/Mar/2010:00:27:10 +0100] "GET /etc/.... Not perfect I know, but least I can process the files, and I can still gain a lot of useful information like country, number of visitors, etc. The log files are 200MB each, so I thought that a shell script is the way to go because I can do that rapidly on my Macbook Pro locally. Unfortunately, I know very little about shell scripting, and my javascript skills are not going to cut it this time. I appreciate your help.

    Read the article

  • IDN aware tools to encode/decode human readable IRI to/from valid URI

    - by Denis Otkidach
    Let's assume a user enter address of some resource and we need to translate it to: <a href="valid URI here">human readable form</a> HTML4 specification refers to RFC 3986 which allows only ASCII alphanumeric characters and dash in host part and all non-ASCII character in other parts should be percent-encoded. That's what I want to put in href attribute to make link working properly in all browsers. IDN should be encoded with Punycode. HTML5 draft refers to RFC 3987 which also allows percent-encoded unicode characters in host part and a large subset of unicode in both host and other parts without encoding them. User may enter address in any of these forms. To provide human readable form of it I need to decode all printable characters. Note that some parts of address might not correspond to valid UTF-8 sequences, usually when target site uses some other character encoding. An example of what I'd like to get: <a href="http://xn--80aswg.xn--p1ai/%D0%BF%D1%83%D1%82%D1%8C?%D0%B7%D0%B0%D0%BF%D1%80%D0%BE%D1%81"> http://????.??/???????????</a> Are there any tools to solve these tasks? I'm especially interested in libraries for Python and JavaScript.

    Read the article

  • Sybase: how can I remove non-printable characters from CHAR or VARCHAR fields with SQL?

    - by Kenny Drobnack
    I'm working with a Sybase database that seems to have non-printable characters in some of the string fields and this is throwing off some of our processing code. At first glance, it seemed to only be newlines and carriage returns, but we also have an ASCII code 27 in there - an ESC character, some accented characters, and some other oddities in there. I have no direct access to change the database, so changing the bad data isn't an option, yet. For now I have to make do with just filtering it out. We're trying to export the table data from one database and load it into a database used by another application in a nightly batch process. Ideally, I'd like to have a function that I can pass a list of characters and just have Sybase return the data with those characters removed. I'd like to keep it something we could do in plain SQL if possible. Something like this to remove characters that are ASCII 0 - 31. select str_replace(FIELD1, (0-31), NULL) as FIELD1, str_replace(FIELD2, (0-31), NULL) as FIELD2 from TABLE So far, str_replace is the nearest I can find, but it only allows replacing one string with another. No support for character ranges and won't let me do the above. We're running on Sybase ASE 12.5 on Unix servers.

    Read the article

  • How does real-time collaboration with multiple clients work in a system using operation transformati

    - by Saikat Chakrabarti
    I just finished reading High-Latency, Low-Bandwidth Windowing in the Jupiter Collaboration System and I mostly followed everything until part 6: global consistency. This part describes how the system described in the paper can be extended to accomodate for multiple clients connected to the server. However, the explanation is very short and essentially says the system will work if the central server merely forwards client messages to all the other clients. I don't really understand how this works though. What state vector would be sent in the message that is sent to all the other clients? Does the server maintain separate state vectors for each client? Does it maintain a separate copy of the widgets locally for each client? The simple example I can think of is this setup: imagine client A, server, and client B with client A and client B both connected to the server. To start, all three have the state object "ABCD". Then, client A sends the message "insert character F at position 0" at the same time client B sends the message "insert character G at position 0" to the server. It seems like simply relaying client A's message to client B and vice versa doesn't actually handle this case. So what exactly does the server do?

    Read the article

  • How to stop tcpdump remotely using expect from a new telnet session

    - by The CodeWriter
    I am trying to stop the tcpdump command from running on a remote terminal. If I telnet to the terminal, start tcpdump, and then send a ^c, tcpdump stops with no issues. However if I telnet to the same terminal, start tcpdump, and then exit the telnet session, when I reconnect to the same telnet session I am unable to stop tcpdump via a ^c. When I do this instead of stopping tcpdump it seems that it just quits the telnet session and tcpdump continues to run on the remote terminal. I provided my script below. Any help is greatly appreciated. #!/usr/local/bin/expect -f exp_internal 1 set timeout 30 spawn /bin/bash expect "] " send "telnet 192.168.62.133 10006\r" expect "Escape character is '^]'." send "\r" expect "# " set now [clock format [clock seconds] -format {%d_%b_%Y_%H%M%S}] set command "tcpdump -vv -i trf400 ip proto 89 -s 65535 -w /tmp/test_term420_${now}.pcp " send "$command\r" expect "tcpdump: listening on" # This works correctly. tcpdump quits and I am returned to the expected prompt send "\x03" expect "# " send "$command\r" expect "tcpdump: listening on" # Exit telnet session send -- "\x1d" expect "telnet> " send -- "q\r" expect "] " # Reconnect to telnet session send "telnet 192.168.62.133 10006\r" expect "Escape character is '^]'." send "\r" # This does not work as intended. The ^c quits the telnet session instead of stopping tcpdump send "\x03" expect "] " send "ls\r" expect "] "

    Read the article

  • How to debug anomalous C memory/stack problems

    - by EBM
    Hello, Sorry I can't be specific with code, but the problems I am seeing are anomalous. Character string values seem to be getting changed depending on other, unrelated code. For example, the value of the argument that is passed around below will change merely depending on if I comment out one or two of the fprintf() calls! By the last fprintf() the value is typically completely empty (and no, I have checked to make sure I am not modifying the argument directly... all I have to do is comment out a fprintf() or add another fprintf() and the value of the string will change at certain points!): static process_args(char *arg) { /* debug */ fprintf(stderr, "Function arg is %s\n", arg); ...do a bunch of stuff including call another function that uses alloc()... /* debug */ fprintf(stderr, "Function arg is now %s\n", arg); } int main(int argc, char *argv[]) { char *my_arg; ... do a bunch of stuff ... /* just to show you it's nothing to do with the argv array */ my_string = strdup(argv[1]); /* debug */ fprintf(stderr, "Argument 1 is %s\n", my_string); process_args(my_string); } There's more code all around, so I can't ask for someone to debug my program -- what I want to know is HOW can I debug why character strings like this are getting their memory changed or overwritten based on unrelated code. Is my memory limited? My stack too small? How do I tell? What else can I do to track down the issue? My program isn't huge, it's like a thousand lines of code give or take and a couple dynamically linked external libs, but nothing out of the ordinary. HELP! TIA!

    Read the article

  • Why does my DataTemplate break the WPF designer?

    - by PRINCESS FLUFF
    Why does the DataTemplate line break the WPF designer in Visual Studio 2008? The program compiles and runs properly. The DataTemplate is applied as it should. However the entire DataTemplate block of code is underlined in red, and when I simply "build" the program without running, I get the error "Type reference cannot find public type named 'Character'" How come it can't find it in the designer yet the program applies the template properly? <UserControl x:Class="WPF_Tests.Tests.TwoCollecViews.TwoViews" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:DetailsPane="clr-namespace:WPF_Tests.Tests.DetailsPane" > <UserControl.Resources> <DataTemplate DataType="{x:Type DetailsPane:Character}"> <StackPanel Orientation="Horizontal"> <TextBlock Text="{Binding Path=Name}"></TextBlock> </StackPanel> </DataTemplate> </UserControl.Resources> <Grid> <ListBox ItemsSource="{Binding Path=Characters}" /> </Grid> </UserControl> EDIT: I am being told that this may be a bug in Visual Studio 2008, as it worked correctly in 2010. You can download the code here: http://www.mediafire.com/?z1myytvwm4n - The Test/TwoCollec xaml file's designer will break with this code.

    Read the article

  • Sharepoint designer is replacing french characters with &#65533;

    - by chris
    First of all, I'm not a web designer, I'm a programmer, so I'm working a bit out of my knowledge area. However, as the person in my office who has some working knowledge of French, I'm stuck with this issue. The Problem: Sharepoint Designer is replacing all French accented characters with the &#65533; (square box or diamond-? �) character. It doesn't appear to matter if I enter the 'é' character as alt-130 (in either design or source or as &eacute; Everything works fine when editing, but when the file is saved and loaded into a browser, it replaces the characters. When reloading into designer, the file shows the 65533 symbol. EDIT: More info. I use &#233; and save, close SP designer, Reloading SP designer will show the é (instead of the code) in source. Next reload will have replaced it with &#65533; Question 1: (more important) HOW DO I STOP THIS!? Question 2: (more interesting) Why does this happen? Charset is iso-8859-1

    Read the article

  • Jquery Autocomplete plugin with Django (Trey Piepmeier solution)

    - by Sally
    So, I'm basing my code on Trey's solution on: http://solutions.treypiepmeier.com/2009/12/10/using-jquery-autocomplete-with-django/ The script is: <script> $(function() { $('#id_members').autocomplete('{{ object.get_absolute_url }}members/lookup', { dataType: 'json', width: 200, parse: function(data) { return $.map(data, function(row) { return { data:row, value:row[1], result:row[0] }; }); } }).result( function(e, data, value) { $("#id_members_pk").val(value); } ); } ); </script> The views.py: def members_lookup(request, pid): results = [] if request.method == "GET": if request.GET.has_key(u'q'): value = request.GET[u'q'] # Ignore queries shorter than length 1 if len(value) > 2: model_results = Member.objects.filter( Q(user__first_name__icontains=value) | Q(user__last_name__icontains=value) ) results = [ (x.user.get_full_name(), x.id) for x in model_results ] json = simplejson.dumps(results) print json return HttpResponse(json, mimetype='application/json') The problem is: It stops refining the search results after the initial lookup. For example: If I set len(value) 2, after I type the 3rd character it will give me a list of suggestions. But if I keep on typing the 4th or 5th character, the list of suggestions doesn't change. Any suggestions on why this is?

    Read the article

  • How can I read a DBF file with incorrectly defined column data types using ADO.NET?

    - by Jason
    I have a several DBF files generated by a third party that I need to be able to query. I am having trouble because all of the column types have been defined as characters, but the data within some of these fields actually contain binary data. If I try to read these fields using an OleDbDataReader as anything other than a string or character array, I get an InvalidCastException thrown, but I need to be able to read them as a binary value or at least cast/convert them after they are read. The columns that actually DO contain text are being returned as expected. For example, the very first column is defined as a character field with a length of 2 bytes, but the field contains a 16-bit integer. I have written the following test code to read the first column and convert it to the appropriate data type, but the value is not coming out right. The first row of the database has a value of 17365 (0x43D5) in the first column. Running the following code, what I end up getting is 17215 (0x433F). I'm pretty sure it has to do with using the ASCII encoding to get the bytes from the string returned by the data reader, but I'm not sure of another way to get the value into the format that I need, other that to write my own DBF reader and bypass ADO.NET altogether which I don't want to do unless I absolutely have to. Any help would be greatly appreciated. byte[] c0; int i0; string con = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\ASTM;Extended Properties=dBASE III;User ID=Admin;Password=;"; using (OleDbConnection c = new OleDbConnection(con)) { c.Open(); OleDbCommand cmd = c.CreateCommand(); cmd.CommandText = "SELECT * FROM astm2007"; OleDbDataReader dr = cmd.ExecuteReader(); while (dr.Read()) { c0 = Encoding.ASCII.GetBytes(dr.GetValue(0).ToString()); i0 = BitConverter.ToInt16(c0, 0); } dr.Dispose(); }

    Read the article

  • Reading UTF-8 XML and writing it to a file with Python

    - by Harri
    I'm trying to parse UTF-8 XML file and save some parts of it to another file. Problem is, that this is my first Python script ever and I'm totally confused about the character encoding problems I'm finding. My script fails immediately when it tries to write non-ascii character to a file, but it can print it to command prompt (at least in some level) Here's the XML (from the parts that matter at least, it's a *.resx file which contains UI strings) <?xml version="1.0" encoding="utf-8"?> <root> <resheader name="foo"> <value>bar</value> </resheader> <data name="lorem" xml:space="preserve"> <value>ipsum öä</value> </data> </root> And here's my python script from xml.dom.minidom import parse names = [] values = [] def getStrings(path): dom = parse(path) data = dom.getElementsByTagName("data") for i in range(len(data)): name = data[i].getAttribute("name") names.append(name) value = data[i].getElementsByTagName("value") values.append(value[0].firstChild.nodeValue.encode("utf-8")) def writeToFile(): with open("uiStrings-fi.py", "w") as f: for i in range(len(names)): line = names[i] + '="'+ values[i] + '"' #varName='varValue' f.write(line) f.write("\n") getStrings("ResourceFile.fi-FI.resx") writeToFile() And here's the traceback: Traceback (most recent call last): File "GenerateLanguageFiles.py", line 24, in writeToFile() File "GenerateLanguageFiles.py", line 19, in writeToFile line = names[i] + '="'+ values[i] + '"' #varName='varValue' UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in ran ge(128) How should I fix my script so it would read and write UTF-8 characters properly? The files I'm trying to generate would be used in test automation with Robots Framework.

    Read the article

  • Custom DataType in DataTemplate breaks WPF designer

    - by PRINCESS FLUFF
    Why does the DataTemplate line break the WPF designer in Visual Studio 2008? The program compiles and runs properly. The DataTemplate is applied as it should. However the entire DataTemplate block of code is underlined in red, and when I simply "build" the program without running, I get the error "Type reference cannot find public type named 'Character'" How come it can't find it in the designer yet the program applies the template properly? <UserControl x:Class="WPF_Tests.Tests.TwoCollecViews.TwoViews" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:DetailsPane="clr-namespace:WPF_Tests.Tests.DetailsPane" > <UserControl.Resources> <DataTemplate DataType="{x:Type DetailsPane:Character}"> <StackPanel Orientation="Horizontal"> <TextBlock Text="{Binding Path=Name}"></TextBlock> </StackPanel> </DataTemplate> </UserControl.Resources> <Grid> <ListBox ItemsSource="{Binding Path=Characters}" /> </Grid> </UserControl> EDIT: I am being told that this may be a bug in Visual Studio 2008, as it worked correctly in 2010. You can download the code here: http://www.mediafire.com/?z1myytvwm4n - The Test/TwoCollec xaml file's designer will break with this code.

    Read the article

  • al32utf8 in oracle and SQL Server and DB2 pulling data

    - by Bob
    I have a non-utf8 oracle database running on 11.1.0.7. We need to support greek characters. So we have two options: use nvarchar, nclob fields for those fields that need greek (it is not all fields). We have tested this and gotten it to work with java coding. convert Oracle to AL32UTF8 database. I am not asking how to do this. I got this from the Oracle Site/Oracle Support. I know what is involved, lossy data, etc, increasing the size of the database. My question is we have users to our system that connect to our database with database links but work on SQL Server and IBM DB2 databases. I do not have access to those databases and I do not have experience with them. If they are not in UTF-8 databases what happens when they pull UTF8 data? I would assume that English/Ascii characters are fine and the greek will end up as junk data. I also ran Oracle Character set scanner (oracle command line utility you use to get info about the affects of a character set conversion). It says that my database will crease in sizez by about 20%. Does this have an affect on users with 3rd party databases? These are customers of our data and there is a limit to how much access I can have to them to run tests. Any information you have would be welcome.

    Read the article

  • Full Text Search in SQL Server 2008 shows wrong display_item for Thai language

    - by ensecoz
    I am working with SQL Server 2008. My task is to investigate the issue where FTS cannot find the right result for Thai. First, I have the table which enables the FTS on the column 'ItemName' which is nvarchar. The Catalog is created with the Thai Language. Note that the Thai language is one of the languages that doesn't separate the word by spaces, so '????' '???' '????' are written like this in a sentence: '???????????' In the table, there are many rows that include the word (????); for example row#1 (ItemName: '???????????') On the webpage, I try to search for '????' but SQL Server cannot find it. So I try to investigate it by trying the following query in SQL Server: select * from sys.dm_fts_parser(N'"???????????"', 1054, 0, 0) ...to see how the words are broken. The first one is the text to be broken. The second parameter is to specify that we're using Thai (WorkBreaker, so on). Here is the result: row#1 (display_item: '????', source_item: '???????????') row#2 (display_item: '????', source_item: '???????????') row#3 (display_item: '??', source_item: '???????????') Notice that the first and second row display the wrong display_item '?' in the '????' isn't even Thai characters. '?' in '????' is not a Thai character either. So the question is where did those alien characters come from? I guess this why I cannot search for '????' because the word breaker is broken and keeping the wrong character in the indexes. Please help!

    Read the article

  • Python performance improvement request for winkler

    - by Martlark
    I'm a python n00b and I'd like some suggestions on how to improve the algorithm to improve the performance of this method to compute the Jaro-Winkler distance of two names. def winklerCompareP(str1, str2): """Return approximate string comparator measure (between 0.0 and 1.0) USAGE: score = winkler(str1, str2) ARGUMENTS: str1 The first string str2 The second string DESCRIPTION: As described in 'An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census' by William E. Winkler and Yves Thibaudeau. Based on the 'jaro' string comparator, but modifies it according to whether the first few characters are the same or not. """ # Quick check if the strings are the same - - - - - - - - - - - - - - - - - - # jaro_winkler_marker_char = chr(1) if (str1 == str2): return 1.0 len1 = len(str1) len2 = len(str2) halflen = max(len1,len2) / 2 - 1 ass1 = '' # Characters assigned in str1 ass2 = '' # Characters assigned in str2 #ass1 = '' #ass2 = '' workstr1 = str1 workstr2 = str2 common1 = 0 # Number of common characters common2 = 0 #print "'len1', str1[i], start, end, index, ass1, workstr2, common1" # Analyse the first string - - - - - - - - - - - - - - - - - - - - - - - - - # for i in range(len1): start = max(0,i-halflen) end = min(i+halflen+1,len2) index = workstr2.find(str1[i],start,end) #print 'len1', str1[i], start, end, index, ass1, workstr2, common1 if (index > -1): # Found common character common1 += 1 #ass1 += str1[i] ass1 = ass1 + str1[i] workstr2 = workstr2[:index]+jaro_winkler_marker_char+workstr2[index+1:] #print "str1 analyse result", ass1, common1 #print "str1 analyse result", ass1, common1 # Analyse the second string - - - - - - - - - - - - - - - - - - - - - - - - - # for i in range(len2): start = max(0,i-halflen) end = min(i+halflen+1,len1) index = workstr1.find(str2[i],start,end) #print 'len2', str2[i], start, end, index, ass1, workstr1, common2 if (index > -1): # Found common character common2 += 1 #ass2 += str2[i] ass2 = ass2 + str2[i] workstr1 = workstr1[:index]+jaro_winkler_marker_char+workstr1[index+1:] if (common1 != common2): print('Winkler: Wrong common values for strings "%s" and "%s"' % \ (str1, str2) + ', common1: %i, common2: %i' % (common1, common2) + \ ', common should be the same.') common1 = float(common1+common2) / 2.0 ##### This is just a fix ##### if (common1 == 0): return 0.0 # Compute number of transpositions - - - - - - - - - - - - - - - - - - - - - # transposition = 0 for i in range(len(ass1)): if (ass1[i] != ass2[i]): transposition += 1 transposition = transposition / 2.0 # Now compute how many characters are common at beginning - - - - - - - - - - # minlen = min(len1,len2) for same in range(minlen+1): if (str1[:same] != str2[:same]): break same -= 1 if (same > 4): same = 4 common1 = float(common1) w = 1./3.*(common1 / float(len1) + common1 / float(len2) + (common1-transposition) / common1) wn = w + same*0.1 * (1.0 - w) return wn

    Read the article

< Previous Page | 82 83 84 85 86 87 88 89 90 91 92 93  | Next Page >