Search Results

Search found 7399 results on 296 pages for 'character entities'.

Page 137/296 | < Previous Page | 133 134 135 136 137 138 139 140 141 142 143 144  | Next Page >

  • RFC regarding WAM

    - by Noctis Skytower
    Request For Comment regarding Whitespace's Assembly Mnemonics What follows in a first generation attempt at creating mnemonics for a whitespace assembly language. STACK ===== push number copy copy number swap away away number MATH ==== add sub mul div mod HEAP ==== set get FLOW ==== part label call label goto label zero label less label back exit I/O === ochr oint ichr iint In the interest of making improvements to this small and simple instruction set, this is a second attempt. hold N Push the number onto the stack copy Duplicate the top item on the stack copy N Copy the nth item on the stack (given by the argument) onto the top of the stack swap Swap the top two items on the stack drop Discard the top item on the stack drop N Slide n items off the stack, keeping the top item add Addition sub Subtraction mul Multiplication div Integer Division mod Modulo save Store load Retrieve L: Mark a location in the program call L Call a subroutine goto L Jump unconditionally to a label if=0 L Jump to a label if the top of the stack is zero if<0 L Jump to a label if the top of the stack is negative return End a subroutine and transfer control back to the caller exit End the program print chr Output the character at the top of the stack print int Output the number at the top of the stack input chr Read a character and place it in the location given by the top of the stack input int Read a number and place it in the location given by the top of the stack What do you think of the following revised list for Whitespace's assembly instructions? I'm still thinking outside of the box somewhat and trying to come up with a better mnemonic set than last time. When the previous interpreter was written, it was completed over two contiguous, rushed evenings. This rewrite deserves significantly more time now that it is the summer. Of course, the next version of Whitespace (0.4) may have its instructions revised even more, but this is just a redesign of what originally was done in a very short amount of time. Hopefully, the instructions make more sense once someone new to programmings thinks about them.

    Read the article

  • UTF-8 BOM signature in PHP files

    - by skidding
    I was writing some commented PHP classes and I stumbled upon a problem. My name (for the @author tag) ends up with a ? (which is a UTF-8 character, ...and a strange name, I know). Even though I save the file as UTF-8, some friends reported that they see that character totally messed up (È™). This problem goes away by adding the BOM signature. But that thing troubles me a bit, since I don't know that much about it, except from what I saw on Wikipedia and on some other similar questions here on SO. I know that it adds some things at the beginning of the file, and from what I understood it's not that bad, but I'm concerned because the only problematic scenarios I read about involved PHP files. And since I'm writing PHP classes to share them, being 100% compatible is more important than having my name in the comments. But I'm trying to understand the implications, should I use it without worrying? or are there cases when it might cause damage? When? Thanks!

    Read the article

  • Can I write this regex in one step?

    - by Marin Doric
    This is the input string "23x +y-34 x + y+21x - 3y2-3x-y+2". I want to surround every '+' and '-' character with whitespaces but only if they are not allready sourrounded from left or right side. So my input string would look like this "23x + y - 34 x + y + 21x - 3y2 - 3x - y + 2". I wrote this code that does the job: Regex reg1 = new Regex(@"\+(?! )|\-(?! )"); input = reg1.Replace(input, delegate(Match m) { return m.Value + " "; }); Regex reg2 = new Regex(@"(?<! )\+|(?<! )\-"); input = reg2.Replace(input, delegate(Match m) { return " " + m.Value; }); explanation: reg1 // Match '+' followed by any character not ' ' (whitespace) or same thing for '-' reg2 // Same thing only that I match '+' or '-' not preceding by ' '(whitespace) delegate 1 and 2 just insert " " before and after m.Value ( match value ) Question is, is there a way to create just one regex and just one delegate? i.e. do this job in one step? I am a new to regex and I want to learn efficient way.

    Read the article

  • Help with C puzzle

    - by Javier Badia
    I found a site with some complicated C puzzles. Right now I'm dealing with this: The following is a piece of C code, whose intention was to print a minus sign 20 times. But you can notice that, it doesn't work. #include <stdio.h> int main() { int i; int n = 20; for( i = 0; i < n; i-- ) printf("-"); return 0; } Well fixing the above code is straight-forward. To make the problem interesting, you have to fix the above code, by changing exactly one character. There are three known solutions. See if you can get all those three. I cannot figure out how to solve. I know that it can be fixed by changing -- to ++, but I can't figure out what single character to change to make it work.

    Read the article

  • Normalizing (webdav) unicode paths

    - by Evert
    Hi guys, I'm working on a WebDAV implementation for PHP. In order to make it easier for Windows and other operating systems to work together, I need jump through some character encoding hoops. Windows uses ISO-8859-1 in it's HTTP request, while most other clients encode anything beyond ascii as UTF-8. My first approach was to ignore this altogether, but I quickly ran into issues when returning urls. I then figured it's probably best to normalize all urls. Using u¨ as an example. This will get sent over the wire by OS/X as u%CC%88 (this is codepoint U+0308) Windows sents this as: %FC (latin1) But, doing a utf8_encode on %FC, I get : %C3%BC (this is codepoint U+00FC) Should I treat %C3%BC and u%CC%88 as the same thing? If so.. how? Not touching it seems to work OK for windows. It somehow understands that it's a unicode character, but updating the same file throws an error (for no particular reason). I'd be happy to provide more information.

    Read the article

  • postgresql error - ERROR: input is out of range

    - by CaffeineIV
    The function below keeps returning this error message. I thought that maybe the double_precision field type was what was causing this, and I tried to use CAST, but either that's not it, or I didn't do it right... Help? Here's the error: ERROR: input is out of range CONTEXT: PL/pgSQL function "calculate_distance" line 7 at RETURN ********** Error ********** ERROR: input is out of range SQL state: 22003 Context: PL/pgSQL function "calculate_distance" line 7 at RETURN And here's the function: CREATE OR REPLACE FUNCTION calculate_distance(character varying, double precision, double precision, double precision, double precision) RETURNS double precision AS $BODY$ DECLARE earth_radius double precision; BEGIN earth_radius := 3959.0; RETURN earth_radius * acos(sin($2 / 57.2958) * sin($4 / 57.2958) + cos($2/ 57.2958) * cos($4 / 57.2958) * cos(($5 / 57.2958) - ($3 / 57.2958))); END; $BODY$ LANGUAGE 'plpgsql' VOLATILE COST 100; ALTER FUNCTION calculate_distance(character varying, double precision, double precision, double precision, double precision) OWNER TO postgres; //I tried changing (unsuccessfully) that RETURN line to: RETURN CAST( (earth_radius * acos(sin($2 / 57.2958) * sin($4 / 57.2958) + cos($2/ 57.2958) * cos($4 / 57.2958) * cos(($5 / 57.2958) - ($3 / 57.2958))) ) AS text);

    Read the article

  • Processing a log to fix a malformed IP address ?.?.?.x

    - by skymook
    I would like to replace the first character 'x' with the number '7' on every line of a log file using a shell script. Example of the log file: 216.129.119.x [01/Mar/2010:00:25:20 +0100] "GET /etc/.... 74.131.77.x [01/Mar/2010:00:25:37 +0100] "GET /etc/.... 222.168.17.x [01/Mar/2010:00:27:10 +0100] "GET /etc/.... My humble beginnings... #!/bin/bash echo Starting script... cd /Users/me/logs/ gzip -d /Users/me/logs/access.log.gz echo Files unzipped... echo I'm totally lost here to process the log file and save it back to hd... exit 0 Why is the log file IP malformed like this? My web provider (1and1) has decide not to store IP address, so they have replaced the last number with the character 'x'. They told me it was a new requirement by 'law'. I personally think that is bs, but that would take us off topic. I want to process these log files with AWstats, so I need an IP address that is not malformed. I want to replace the x with a 7, like so: 216.129.119.7 [01/Mar/2010:00:25:20 +0100] "GET /etc/.... 74.131.77.7 [01/Mar/2010:00:25:37 +0100] "GET /etc/.... 222.168.17.7 [01/Mar/2010:00:27:10 +0100] "GET /etc/.... Not perfect I know, but least I can process the files, and I can still gain a lot of useful information like country, number of visitors, etc. The log files are 200MB each, so I thought that a shell script is the way to go because I can do that rapidly on my Macbook Pro locally. Unfortunately, I know very little about shell scripting, and my javascript skills are not going to cut it this time. I appreciate your help.

    Read the article

  • IDN aware tools to encode/decode human readable IRI to/from valid URI

    - by Denis Otkidach
    Let's assume a user enter address of some resource and we need to translate it to: <a href="valid URI here">human readable form</a> HTML4 specification refers to RFC 3986 which allows only ASCII alphanumeric characters and dash in host part and all non-ASCII character in other parts should be percent-encoded. That's what I want to put in href attribute to make link working properly in all browsers. IDN should be encoded with Punycode. HTML5 draft refers to RFC 3987 which also allows percent-encoded unicode characters in host part and a large subset of unicode in both host and other parts without encoding them. User may enter address in any of these forms. To provide human readable form of it I need to decode all printable characters. Note that some parts of address might not correspond to valid UTF-8 sequences, usually when target site uses some other character encoding. An example of what I'd like to get: <a href="http://xn--80aswg.xn--p1ai/%D0%BF%D1%83%D1%82%D1%8C?%D0%B7%D0%B0%D0%BF%D1%80%D0%BE%D1%81"> http://????.??/???????????</a> Are there any tools to solve these tasks? I'm especially interested in libraries for Python and JavaScript.

    Read the article

  • Sybase: how can I remove non-printable characters from CHAR or VARCHAR fields with SQL?

    - by Kenny Drobnack
    I'm working with a Sybase database that seems to have non-printable characters in some of the string fields and this is throwing off some of our processing code. At first glance, it seemed to only be newlines and carriage returns, but we also have an ASCII code 27 in there - an ESC character, some accented characters, and some other oddities in there. I have no direct access to change the database, so changing the bad data isn't an option, yet. For now I have to make do with just filtering it out. We're trying to export the table data from one database and load it into a database used by another application in a nightly batch process. Ideally, I'd like to have a function that I can pass a list of characters and just have Sybase return the data with those characters removed. I'd like to keep it something we could do in plain SQL if possible. Something like this to remove characters that are ASCII 0 - 31. select str_replace(FIELD1, (0-31), NULL) as FIELD1, str_replace(FIELD2, (0-31), NULL) as FIELD2 from TABLE So far, str_replace is the nearest I can find, but it only allows replacing one string with another. No support for character ranges and won't let me do the above. We're running on Sybase ASE 12.5 on Unix servers.

    Read the article

  • How to debug anomalous C memory/stack problems

    - by EBM
    Hello, Sorry I can't be specific with code, but the problems I am seeing are anomalous. Character string values seem to be getting changed depending on other, unrelated code. For example, the value of the argument that is passed around below will change merely depending on if I comment out one or two of the fprintf() calls! By the last fprintf() the value is typically completely empty (and no, I have checked to make sure I am not modifying the argument directly... all I have to do is comment out a fprintf() or add another fprintf() and the value of the string will change at certain points!): static process_args(char *arg) { /* debug */ fprintf(stderr, "Function arg is %s\n", arg); ...do a bunch of stuff including call another function that uses alloc()... /* debug */ fprintf(stderr, "Function arg is now %s\n", arg); } int main(int argc, char *argv[]) { char *my_arg; ... do a bunch of stuff ... /* just to show you it's nothing to do with the argv array */ my_string = strdup(argv[1]); /* debug */ fprintf(stderr, "Argument 1 is %s\n", my_string); process_args(my_string); } There's more code all around, so I can't ask for someone to debug my program -- what I want to know is HOW can I debug why character strings like this are getting their memory changed or overwritten based on unrelated code. Is my memory limited? My stack too small? How do I tell? What else can I do to track down the issue? My program isn't huge, it's like a thousand lines of code give or take and a couple dynamically linked external libs, but nothing out of the ordinary. HELP! TIA!

    Read the article

  • How does real-time collaboration with multiple clients work in a system using operation transformati

    - by Saikat Chakrabarti
    I just finished reading High-Latency, Low-Bandwidth Windowing in the Jupiter Collaboration System and I mostly followed everything until part 6: global consistency. This part describes how the system described in the paper can be extended to accomodate for multiple clients connected to the server. However, the explanation is very short and essentially says the system will work if the central server merely forwards client messages to all the other clients. I don't really understand how this works though. What state vector would be sent in the message that is sent to all the other clients? Does the server maintain separate state vectors for each client? Does it maintain a separate copy of the widgets locally for each client? The simple example I can think of is this setup: imagine client A, server, and client B with client A and client B both connected to the server. To start, all three have the state object "ABCD". Then, client A sends the message "insert character F at position 0" at the same time client B sends the message "insert character G at position 0" to the server. It seems like simply relaying client A's message to client B and vice versa doesn't actually handle this case. So what exactly does the server do?

    Read the article

  • How to stop tcpdump remotely using expect from a new telnet session

    - by The CodeWriter
    I am trying to stop the tcpdump command from running on a remote terminal. If I telnet to the terminal, start tcpdump, and then send a ^c, tcpdump stops with no issues. However if I telnet to the same terminal, start tcpdump, and then exit the telnet session, when I reconnect to the same telnet session I am unable to stop tcpdump via a ^c. When I do this instead of stopping tcpdump it seems that it just quits the telnet session and tcpdump continues to run on the remote terminal. I provided my script below. Any help is greatly appreciated. #!/usr/local/bin/expect -f exp_internal 1 set timeout 30 spawn /bin/bash expect "] " send "telnet 192.168.62.133 10006\r" expect "Escape character is '^]'." send "\r" expect "# " set now [clock format [clock seconds] -format {%d_%b_%Y_%H%M%S}] set command "tcpdump -vv -i trf400 ip proto 89 -s 65535 -w /tmp/test_term420_${now}.pcp " send "$command\r" expect "tcpdump: listening on" # This works correctly. tcpdump quits and I am returned to the expected prompt send "\x03" expect "# " send "$command\r" expect "tcpdump: listening on" # Exit telnet session send -- "\x1d" expect "telnet> " send -- "q\r" expect "] " # Reconnect to telnet session send "telnet 192.168.62.133 10006\r" expect "Escape character is '^]'." send "\r" # This does not work as intended. The ^c quits the telnet session instead of stopping tcpdump send "\x03" expect "] " send "ls\r" expect "] "

    Read the article

  • Why does my DataTemplate break the WPF designer?

    - by PRINCESS FLUFF
    Why does the DataTemplate line break the WPF designer in Visual Studio 2008? The program compiles and runs properly. The DataTemplate is applied as it should. However the entire DataTemplate block of code is underlined in red, and when I simply "build" the program without running, I get the error "Type reference cannot find public type named 'Character'" How come it can't find it in the designer yet the program applies the template properly? <UserControl x:Class="WPF_Tests.Tests.TwoCollecViews.TwoViews" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:DetailsPane="clr-namespace:WPF_Tests.Tests.DetailsPane" > <UserControl.Resources> <DataTemplate DataType="{x:Type DetailsPane:Character}"> <StackPanel Orientation="Horizontal"> <TextBlock Text="{Binding Path=Name}"></TextBlock> </StackPanel> </DataTemplate> </UserControl.Resources> <Grid> <ListBox ItemsSource="{Binding Path=Characters}" /> </Grid> </UserControl> EDIT: I am being told that this may be a bug in Visual Studio 2008, as it worked correctly in 2010. You can download the code here: http://www.mediafire.com/?z1myytvwm4n - The Test/TwoCollec xaml file's designer will break with this code.

    Read the article

  • Sharepoint designer is replacing french characters with &#65533;

    - by chris
    First of all, I'm not a web designer, I'm a programmer, so I'm working a bit out of my knowledge area. However, as the person in my office who has some working knowledge of French, I'm stuck with this issue. The Problem: Sharepoint Designer is replacing all French accented characters with the &#65533; (square box or diamond-? �) character. It doesn't appear to matter if I enter the 'é' character as alt-130 (in either design or source or as &eacute; Everything works fine when editing, but when the file is saved and loaded into a browser, it replaces the characters. When reloading into designer, the file shows the 65533 symbol. EDIT: More info. I use &#233; and save, close SP designer, Reloading SP designer will show the é (instead of the code) in source. Next reload will have replaced it with &#65533; Question 1: (more important) HOW DO I STOP THIS!? Question 2: (more interesting) Why does this happen? Charset is iso-8859-1

    Read the article

  • Jquery Autocomplete plugin with Django (Trey Piepmeier solution)

    - by Sally
    So, I'm basing my code on Trey's solution on: http://solutions.treypiepmeier.com/2009/12/10/using-jquery-autocomplete-with-django/ The script is: <script> $(function() { $('#id_members').autocomplete('{{ object.get_absolute_url }}members/lookup', { dataType: 'json', width: 200, parse: function(data) { return $.map(data, function(row) { return { data:row, value:row[1], result:row[0] }; }); } }).result( function(e, data, value) { $("#id_members_pk").val(value); } ); } ); </script> The views.py: def members_lookup(request, pid): results = [] if request.method == "GET": if request.GET.has_key(u'q'): value = request.GET[u'q'] # Ignore queries shorter than length 1 if len(value) > 2: model_results = Member.objects.filter( Q(user__first_name__icontains=value) | Q(user__last_name__icontains=value) ) results = [ (x.user.get_full_name(), x.id) for x in model_results ] json = simplejson.dumps(results) print json return HttpResponse(json, mimetype='application/json') The problem is: It stops refining the search results after the initial lookup. For example: If I set len(value) 2, after I type the 3rd character it will give me a list of suggestions. But if I keep on typing the 4th or 5th character, the list of suggestions doesn't change. Any suggestions on why this is?

    Read the article

  • How can I read a DBF file with incorrectly defined column data types using ADO.NET?

    - by Jason
    I have a several DBF files generated by a third party that I need to be able to query. I am having trouble because all of the column types have been defined as characters, but the data within some of these fields actually contain binary data. If I try to read these fields using an OleDbDataReader as anything other than a string or character array, I get an InvalidCastException thrown, but I need to be able to read them as a binary value or at least cast/convert them after they are read. The columns that actually DO contain text are being returned as expected. For example, the very first column is defined as a character field with a length of 2 bytes, but the field contains a 16-bit integer. I have written the following test code to read the first column and convert it to the appropriate data type, but the value is not coming out right. The first row of the database has a value of 17365 (0x43D5) in the first column. Running the following code, what I end up getting is 17215 (0x433F). I'm pretty sure it has to do with using the ASCII encoding to get the bytes from the string returned by the data reader, but I'm not sure of another way to get the value into the format that I need, other that to write my own DBF reader and bypass ADO.NET altogether which I don't want to do unless I absolutely have to. Any help would be greatly appreciated. byte[] c0; int i0; string con = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\ASTM;Extended Properties=dBASE III;User ID=Admin;Password=;"; using (OleDbConnection c = new OleDbConnection(con)) { c.Open(); OleDbCommand cmd = c.CreateCommand(); cmd.CommandText = "SELECT * FROM astm2007"; OleDbDataReader dr = cmd.ExecuteReader(); while (dr.Read()) { c0 = Encoding.ASCII.GetBytes(dr.GetValue(0).ToString()); i0 = BitConverter.ToInt16(c0, 0); } dr.Dispose(); }

    Read the article

  • Reading UTF-8 XML and writing it to a file with Python

    - by Harri
    I'm trying to parse UTF-8 XML file and save some parts of it to another file. Problem is, that this is my first Python script ever and I'm totally confused about the character encoding problems I'm finding. My script fails immediately when it tries to write non-ascii character to a file, but it can print it to command prompt (at least in some level) Here's the XML (from the parts that matter at least, it's a *.resx file which contains UI strings) <?xml version="1.0" encoding="utf-8"?> <root> <resheader name="foo"> <value>bar</value> </resheader> <data name="lorem" xml:space="preserve"> <value>ipsum öä</value> </data> </root> And here's my python script from xml.dom.minidom import parse names = [] values = [] def getStrings(path): dom = parse(path) data = dom.getElementsByTagName("data") for i in range(len(data)): name = data[i].getAttribute("name") names.append(name) value = data[i].getElementsByTagName("value") values.append(value[0].firstChild.nodeValue.encode("utf-8")) def writeToFile(): with open("uiStrings-fi.py", "w") as f: for i in range(len(names)): line = names[i] + '="'+ values[i] + '"' #varName='varValue' f.write(line) f.write("\n") getStrings("ResourceFile.fi-FI.resx") writeToFile() And here's the traceback: Traceback (most recent call last): File "GenerateLanguageFiles.py", line 24, in writeToFile() File "GenerateLanguageFiles.py", line 19, in writeToFile line = names[i] + '="'+ values[i] + '"' #varName='varValue' UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in ran ge(128) How should I fix my script so it would read and write UTF-8 characters properly? The files I'm trying to generate would be used in test automation with Robots Framework.

    Read the article

  • Custom DataType in DataTemplate breaks WPF designer

    - by PRINCESS FLUFF
    Why does the DataTemplate line break the WPF designer in Visual Studio 2008? The program compiles and runs properly. The DataTemplate is applied as it should. However the entire DataTemplate block of code is underlined in red, and when I simply "build" the program without running, I get the error "Type reference cannot find public type named 'Character'" How come it can't find it in the designer yet the program applies the template properly? <UserControl x:Class="WPF_Tests.Tests.TwoCollecViews.TwoViews" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:DetailsPane="clr-namespace:WPF_Tests.Tests.DetailsPane" > <UserControl.Resources> <DataTemplate DataType="{x:Type DetailsPane:Character}"> <StackPanel Orientation="Horizontal"> <TextBlock Text="{Binding Path=Name}"></TextBlock> </StackPanel> </DataTemplate> </UserControl.Resources> <Grid> <ListBox ItemsSource="{Binding Path=Characters}" /> </Grid> </UserControl> EDIT: I am being told that this may be a bug in Visual Studio 2008, as it worked correctly in 2010. You can download the code here: http://www.mediafire.com/?z1myytvwm4n - The Test/TwoCollec xaml file's designer will break with this code.

    Read the article

  • al32utf8 in oracle and SQL Server and DB2 pulling data

    - by Bob
    I have a non-utf8 oracle database running on 11.1.0.7. We need to support greek characters. So we have two options: use nvarchar, nclob fields for those fields that need greek (it is not all fields). We have tested this and gotten it to work with java coding. convert Oracle to AL32UTF8 database. I am not asking how to do this. I got this from the Oracle Site/Oracle Support. I know what is involved, lossy data, etc, increasing the size of the database. My question is we have users to our system that connect to our database with database links but work on SQL Server and IBM DB2 databases. I do not have access to those databases and I do not have experience with them. If they are not in UTF-8 databases what happens when they pull UTF8 data? I would assume that English/Ascii characters are fine and the greek will end up as junk data. I also ran Oracle Character set scanner (oracle command line utility you use to get info about the affects of a character set conversion). It says that my database will crease in sizez by about 20%. Does this have an affect on users with 3rd party databases? These are customers of our data and there is a limit to how much access I can have to them to run tests. Any information you have would be welcome.

    Read the article

  • Full Text Search in SQL Server 2008 shows wrong display_item for Thai language

    - by ensecoz
    I am working with SQL Server 2008. My task is to investigate the issue where FTS cannot find the right result for Thai. First, I have the table which enables the FTS on the column 'ItemName' which is nvarchar. The Catalog is created with the Thai Language. Note that the Thai language is one of the languages that doesn't separate the word by spaces, so '????' '???' '????' are written like this in a sentence: '???????????' In the table, there are many rows that include the word (????); for example row#1 (ItemName: '???????????') On the webpage, I try to search for '????' but SQL Server cannot find it. So I try to investigate it by trying the following query in SQL Server: select * from sys.dm_fts_parser(N'"???????????"', 1054, 0, 0) ...to see how the words are broken. The first one is the text to be broken. The second parameter is to specify that we're using Thai (WorkBreaker, so on). Here is the result: row#1 (display_item: '????', source_item: '???????????') row#2 (display_item: '????', source_item: '???????????') row#3 (display_item: '??', source_item: '???????????') Notice that the first and second row display the wrong display_item '?' in the '????' isn't even Thai characters. '?' in '????' is not a Thai character either. So the question is where did those alien characters come from? I guess this why I cannot search for '????' because the word breaker is broken and keeping the wrong character in the indexes. Please help!

    Read the article

  • Python performance improvement request for winkler

    - by Martlark
    I'm a python n00b and I'd like some suggestions on how to improve the algorithm to improve the performance of this method to compute the Jaro-Winkler distance of two names. def winklerCompareP(str1, str2): """Return approximate string comparator measure (between 0.0 and 1.0) USAGE: score = winkler(str1, str2) ARGUMENTS: str1 The first string str2 The second string DESCRIPTION: As described in 'An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census' by William E. Winkler and Yves Thibaudeau. Based on the 'jaro' string comparator, but modifies it according to whether the first few characters are the same or not. """ # Quick check if the strings are the same - - - - - - - - - - - - - - - - - - # jaro_winkler_marker_char = chr(1) if (str1 == str2): return 1.0 len1 = len(str1) len2 = len(str2) halflen = max(len1,len2) / 2 - 1 ass1 = '' # Characters assigned in str1 ass2 = '' # Characters assigned in str2 #ass1 = '' #ass2 = '' workstr1 = str1 workstr2 = str2 common1 = 0 # Number of common characters common2 = 0 #print "'len1', str1[i], start, end, index, ass1, workstr2, common1" # Analyse the first string - - - - - - - - - - - - - - - - - - - - - - - - - # for i in range(len1): start = max(0,i-halflen) end = min(i+halflen+1,len2) index = workstr2.find(str1[i],start,end) #print 'len1', str1[i], start, end, index, ass1, workstr2, common1 if (index > -1): # Found common character common1 += 1 #ass1 += str1[i] ass1 = ass1 + str1[i] workstr2 = workstr2[:index]+jaro_winkler_marker_char+workstr2[index+1:] #print "str1 analyse result", ass1, common1 #print "str1 analyse result", ass1, common1 # Analyse the second string - - - - - - - - - - - - - - - - - - - - - - - - - # for i in range(len2): start = max(0,i-halflen) end = min(i+halflen+1,len1) index = workstr1.find(str2[i],start,end) #print 'len2', str2[i], start, end, index, ass1, workstr1, common2 if (index > -1): # Found common character common2 += 1 #ass2 += str2[i] ass2 = ass2 + str2[i] workstr1 = workstr1[:index]+jaro_winkler_marker_char+workstr1[index+1:] if (common1 != common2): print('Winkler: Wrong common values for strings "%s" and "%s"' % \ (str1, str2) + ', common1: %i, common2: %i' % (common1, common2) + \ ', common should be the same.') common1 = float(common1+common2) / 2.0 ##### This is just a fix ##### if (common1 == 0): return 0.0 # Compute number of transpositions - - - - - - - - - - - - - - - - - - - - - # transposition = 0 for i in range(len(ass1)): if (ass1[i] != ass2[i]): transposition += 1 transposition = transposition / 2.0 # Now compute how many characters are common at beginning - - - - - - - - - - # minlen = min(len1,len2) for same in range(minlen+1): if (str1[:same] != str2[:same]): break same -= 1 if (same > 4): same = 4 common1 = float(common1) w = 1./3.*(common1 / float(len1) + common1 / float(len2) + (common1-transposition) / common1) wn = w + same*0.1 * (1.0 - w) return wn

    Read the article

  • Munging non-printable characters to dots using string.translate()

    - by Jim Dennis
    So I've done this before and it's a surprising ugly bit of code for such a seemingly simple task. The goal is to translate any non-printable character into a . (dot). For my purposes "printable" does exclude the last few characters from string.printable (new-lines, tabs, and so on). This is for printing things like the old MS-DOS debug "hex dump" format ... or anything similar to that (where additional whitespace will mangle the intended dump layout). I know I can use string.translate() and, to use that, I need a translation table. So I use string.maketrans() for that. Here's the best I could come up with: filter = string.maketrans( string.translate(string.maketrans('',''), string.maketrans('',''),string.printable[:-5]), '.'*len(string.translate(string.maketrans('',''), string.maketrans('',''),string.printable[:-5]))) ... which is an unreadable mess (though it does work). From there you can call use something like: for each_line in sometext: print string.translate(each_line, filter) ... and be happy. (So long as you don't look under the hood). Now it is more readable if I break that horrid expression into separate statements: ascii = string.maketrans('','') # The whole ASCII character set nonprintable = string.translate(ascii, ascii, string.printable[:-5]) # Optional delchars argument filter = string.maketrans(nonprintable, '.' * len(nonprintable)) And it's tempting to do that just for legibility. However, I keep thinking there has to be a more elegant way to express this!

    Read the article

  • Full Text Search in MSSQL2008 shows wrong display_item for Thai language

    - by ensecoz
    I am working with MSSQL2008. My task is to investigate the issue where FTS cannot find the right result for Thai. First, I have the table which enables the FTS on the column 'ItemName' which is nvarchar. The Catalog is created with the Thai Language. Note that the Thai language is one of the languages that doesn't separate the word by spaces, so '????' '???' '????' are written like this in a sentence: '???????????' In the table, there are many rows that include the word (????); for example row#1 (ItemName: '???????????') On the webpage, I try to search for '????' but SQLServer cannot find it. So I try to investigate it by trying the following query in SQLServer: select * from sys.dm_fts_parser(N'"???????????"', 1054, 0, 0) ...to see how the words are broken. The first one is the text to be broken. The second parameter is to specify that we're using Thai (WorkBreaker, so on). Here is the result: row#1 (display_item: '????', source_item: '???????????') row#2 (display_item: '????', source_item: '???????????') row#3 (display_item: '??', source_item: '???????????') Notice that the first and second row display the wrong display_item '?' in the '????' isn't even Thai characters. '?' in '????' is not a Thai character either. So the question is where did those alien characters come from? I guess this why I cannot search for '????' because the word breaker is broken and keeping the wrong character in the indexes. Please help!

    Read the article

  • Issues with reversing bit shifts that roll over the maximum byte size?

    - by Terri
    I have a string of binary numbers that was originally a regular string and will return to a regular string after some bit manipulation. I'm trying to do a simple caesarian shift on the binary string, and it needs to be reversable. I've done this with this method.. public static String cShift(String ptxt, int addFactor) { String ascii = ""; for (int i = 0; i < ptxt.length(); i+=8) { int character = Integer.parseInt(ptxt.substring(i, i+8), 2); byte sum = (byte) (character + addFactor); ascii += (char)sum; } String returnToBinary = convertToBinary(ascii); return returnToBinary; } This works fine in some cases. However, I think when it rolls over being representable by one byte it's irreversable. On the test string "test!22*F ", with an addFactor of 12, the string becomes irreversible. Why is that and how can I stop it?

    Read the article

  • Python print statement prints nothing with a carriage return

    - by Jonathan Sternberg
    I'm trying to write a simple tool that reads files from disc, does some image processing, and returns the result of the algorithm. Since the program can sometimes take awhile, I like to have a progress bar so I know where it is in the program. And since I don't like to clutter up my command line and I'm on a Unix platform, I wanted to use the '\r' character to print the progress bar on only one line. But when I have this code here, it prints nothing. # Files is a list with the filenames for i, f in enumerate(files): print '\r%d / %d' % (i, len(files)), # Code that takes a long time I have also tried: print '\r', i, '/', len(files), Now just to make sure this worked in python, I tried this: heartbeat = 1 while True: print '\rHello, world', heartbeat, heartbeat += 1 This code works perfectly. What's going on? My understanding of carriage returns on Linux was that it would just move the line feed character to the beginning and then I could overwrite old text that was written previously, as long as I don't print a newline anywhere. This doesn't seem to be happening though. Also, is there a better way to display a progress bar in a command line than what I'm current trying to do?

    Read the article

< Previous Page | 133 134 135 136 137 138 139 140 141 142 143 144  | Next Page >