Search Results

Search found 7 results on 1 pages for 'winkler'.

Page 1/1 | 1 

  • Python performance improvement request for winkler

    - by Martlark
    I'm a python n00b and I'd like some suggestions on how to improve the algorithm to improve the performance of this method to compute the Jaro-Winkler distance of two names. def winklerCompareP(str1, str2): """Return approximate string comparator measure (between 0.0 and 1.0) USAGE: score = winkler(str1, str2) ARGUMENTS: str1 The first string str2 The second string DESCRIPTION: As described in 'An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census' by William E. Winkler and Yves Thibaudeau. Based on the 'jaro' string comparator, but modifies it according to whether the first few characters are the same or not. """ # Quick check if the strings are the same - - - - - - - - - - - - - - - - - - # jaro_winkler_marker_char = chr(1) if (str1 == str2): return 1.0 len1 = len(str1) len2 = len(str2) halflen = max(len1,len2) / 2 - 1 ass1 = '' # Characters assigned in str1 ass2 = '' # Characters assigned in str2 #ass1 = '' #ass2 = '' workstr1 = str1 workstr2 = str2 common1 = 0 # Number of common characters common2 = 0 #print "'len1', str1[i], start, end, index, ass1, workstr2, common1" # Analyse the first string - - - - - - - - - - - - - - - - - - - - - - - - - # for i in range(len1): start = max(0,i-halflen) end = min(i+halflen+1,len2) index = workstr2.find(str1[i],start,end) #print 'len1', str1[i], start, end, index, ass1, workstr2, common1 if (index > -1): # Found common character common1 += 1 #ass1 += str1[i] ass1 = ass1 + str1[i] workstr2 = workstr2[:index]+jaro_winkler_marker_char+workstr2[index+1:] #print "str1 analyse result", ass1, common1 #print "str1 analyse result", ass1, common1 # Analyse the second string - - - - - - - - - - - - - - - - - - - - - - - - - # for i in range(len2): start = max(0,i-halflen) end = min(i+halflen+1,len1) index = workstr1.find(str2[i],start,end) #print 'len2', str2[i], start, end, index, ass1, workstr1, common2 if (index > -1): # Found common character common2 += 1 #ass2 += str2[i] ass2 = ass2 + str2[i] workstr1 = workstr1[:index]+jaro_winkler_marker_char+workstr1[index+1:] if (common1 != common2): print('Winkler: Wrong common values for strings "%s" and "%s"' % \ (str1, str2) + ', common1: %i, common2: %i' % (common1, common2) + \ ', common should be the same.') common1 = float(common1+common2) / 2.0 ##### This is just a fix ##### if (common1 == 0): return 0.0 # Compute number of transpositions - - - - - - - - - - - - - - - - - - - - - # transposition = 0 for i in range(len(ass1)): if (ass1[i] != ass2[i]): transposition += 1 transposition = transposition / 2.0 # Now compute how many characters are common at beginning - - - - - - - - - - # minlen = min(len1,len2) for same in range(minlen+1): if (str1[:same] != str2[:same]): break same -= 1 if (same > 4): same = 4 common1 = float(common1) w = 1./3.*(common1 / float(len1) + common1 / float(len2) + (common1-transposition) / common1) wn = w + same*0.1 * (1.0 - w) return wn

    Read the article

  • Optimizing Jaro-Winkler algorithm

    - by Pentium10
    I have this code for Jaro-Winkler algorithm taken from this website. I need to run 150,000 times to get distance between differences. It takes a long time, as I run on an Android mobile device. Can it be optimized more? public class Jaro { /** * gets the similarity of the two strings using Jaro distance. * * @param string1 the first input string * @param string2 the second input string * @return a value between 0-1 of the similarity */ public float getSimilarity(final String string1, final String string2) { //get half the length of the string rounded up - (this is the distance used for acceptable transpositions) final int halflen = ((Math.min(string1.length(), string2.length())) / 2) + ((Math.min(string1.length(), string2.length())) % 2); //get common characters final StringBuffer common1 = getCommonCharacters(string1, string2, halflen); final StringBuffer common2 = getCommonCharacters(string2, string1, halflen); //check for zero in common if (common1.length() == 0 || common2.length() == 0) { return 0.0f; } //check for same length common strings returning 0.0f is not the same if (common1.length() != common2.length()) { return 0.0f; } //get the number of transpositions int transpositions = 0; int n=common1.length(); for (int i = 0; i < n; i++) { if (common1.charAt(i) != common2.charAt(i)) transpositions++; } transpositions /= 2.0f; //calculate jaro metric return (common1.length() / ((float) string1.length()) + common2.length() / ((float) string2.length()) + (common1.length() - transpositions) / ((float) common1.length())) / 3.0f; } /** * returns a string buffer of characters from string1 within string2 if they are of a given * distance seperation from the position in string1. * * @param string1 * @param string2 * @param distanceSep * @return a string buffer of characters from string1 within string2 if they are of a given * distance seperation from the position in string1 */ private static StringBuffer getCommonCharacters(final String string1, final String string2, final int distanceSep) { //create a return buffer of characters final StringBuffer returnCommons = new StringBuffer(); //create a copy of string2 for processing final StringBuffer copy = new StringBuffer(string2); //iterate over string1 int n=string1.length(); int m=string2.length(); for (int i = 0; i < n; i++) { final char ch = string1.charAt(i); //set boolean for quick loop exit if found boolean foundIt = false; //compare char with range of characters to either side for (int j = Math.max(0, i - distanceSep); !foundIt && j < Math.min(i + distanceSep, m - 1); j++) { //check if found if (copy.charAt(j) == ch) { foundIt = true; //append character found returnCommons.append(ch); //alter copied string2 for processing copy.setCharAt(j, (char)0); } } } return returnCommons; } } I mention that in the whole process I make just instance of the script, so only once jaro= new Jaro(); If you are going to test and need examples so not break the script, you will find it here, in another thread for python optimization.

    Read the article

  • New Cloud Security Book: Securing the Cloud by Vic Winkler

    - by user12608550
    It's rare that I read a technical book straight through; I usually read key chapters and save the rest for later reference. But Winkler's book, written by an accomplished and highly experienced security professional, was worth a complete read, cover to cover. Of the recently published cloud security books, such as... Cloud Security and Privacy: An Enterprise Perspective on Risks and Compliance, by Tim Mather, Subra Kumaraswamy, and Shahed Latif; O'Reilly Media Inc, 2009; Cloud Computing: Implementation, Management, and Security, by John Rittenhouse and James Ransome; CRC Press 2010; Cloud Security: A Comprehensive Guide to Secure Cloud Computing, by Ronald Krutz and Russell Vines; Wiley Publishing Inc, 2010 ...Securing the Cloud is the most useful and informative about all aspects of cloud security. Clearly, through his experience, the author has thought through many practical issues of securing large, virtualized IT installations. His Chapter 6 on Best Practices and Chapter 9 with its valuable checklists are worth the price of the book. If you are among the many new cloud computing professionals, Securing the Cloud is an essential reference for your work.

    Read the article

  • MVC: Why put the business logic in the model? What happens when I've multiple types of storage?

    - by Steffen Winkler
    I always thought that the business logic has to be in the controller and that the controller, since it is the 'middle' part, stays static and that the model/view have to be capsuled via interfaces, that way you could change the business logic without affecting anything else, program multiple Models (one for each database/type of storage) and a dozens of views (for different platforms for example). Now I read in this question that you should always put the business logic into the model and that the controller is deeply connected with the view. To me, that doesn't really make sense and implies that each time I want to have the means of supporting another database/type of storage I've to rewrite my whole model including the business logic. And if I want another view, I've to rewrite both the view and the controller. May someone explain why that is or if I went wrong somewhere? Currently, that whole thing doesn't really make sense to me.

    Read the article

  • Directory directive: AuthType None but still need an AuthProvider?

    - by Steffen Winkler
    For now I just need the server to let me download files from one specific folder (in my case I chose /opt/myFolder for that task) Distribution is Debian 6.0 *edit_start* Apache version is 2.4, according to their official documentation, the Order/Allow clauses are deprecated and should not be used anymore I'm an idiot: Apache version is 2.2. *edit_end* My directory directives in apache2.conf look like this: <IfModule dir_module> DirectoryIndex index.html index.htm index.php </IfModule> ServerRoot "/etc/apache2" DocumentRoot "/opt/myFolder" <Directory /> Options FollowSymLinks AuthType None AllowOverride None Require all denie </Directory> <Directory "/opt/myFolder/*"> Options FollowSymLinks MultiViews AllowOverride None AuthType None Require all allow </Directory> When I try to access a file inside that folder (http://myserver.de/aTestFile.zip) I get an Internal Server Error. Also Apache writes the following error into it's log: configuration error: couldn't check user. Check your authn provider!: /aTestFile.zip Why would I need an authn provider if I don't want any authentication? Also I hope someone can explain to me what kind of AuthenticationProvider I'd need for that. Everytime I search for those things I get pointed at people asking how to protect files/directories with passwords or restrict access to some IP addresses, which doesn't really help me. ok, since I've Apache version 2.2, here is the error I get when using the Order/Deny/Allow commands instead of AuthType/Require: Invalid command 'Order', perhaps misspelled or defined by a module not included in the server configuration.

    Read the article

  • Today's Links (6/23/2011)

    - by Bob Rhubart
    Lydia Smyers interviews Justin "Mr. OTN" Kestelyn on the Oracle ACE Program Justin Kestelyn describes the Oracle ACE program, what it means to the developer community, and how to get involved. Incremental Essbase Metadata Imports Now Possible with OBIEE 11g | Mark Rittman "So, how does this work, and how easy is it to implement?" asks Oracle ACE Director Mark Rittman, and then he dives in to find out. ORACLENERD: The Podcast Oracle ACE Chet "ORACLENERD" Justice recounts his brush with stardom on Christian Screen's The Art of Business Intelligence podcast. Bay Area Coherence Special Interest Group Next Meeting July 21, 2011 | Cristóbal Soto Soto shares information on next month's Bay Area Coherence SIG shindig. New Cloud Security Book: Securing the Cloud by Vic Winkler | Dr Cloud's Flying Software Circus "Securing the Cloud is the most useful and informative about all aspects of cloud security," says Harry "Dr. Cloud" Foxwell. Oracle MDM Maturity Model | David Butler "The model covers maturity levels around five key areas: Profiling data sources; Defining a data strategy; Defining a data consolidation plan; Data maintenance; and Data utilization," says Butler. Integrating Strategic Planning for Cloud and SOA | David Sprott "Full blown Cloud adoption implies mature and sophisticated SOA implementation and impacts many business processes," says Sprott.

    Read the article

1