fuzzy comparison - Developer IT

Fuzzy Regular Expressions

- by Thomas Ahle

In my work I have with great results used approximate string matching algorithms such as Damerau–Levenshtein distance to make my code less vulnerable to spelling mistakes. Now I have a need to match strings against simple regular expressions such TV Schedule for \d\d (Jan|Feb|Mar|...). This means that the string TV Schedule for 10 Jan should return 0 while T Schedule for 10. Jan should return 2. This could be done by generating all strings in the regex (in this case 100x12) and find the best match, but that doesn't seam practical. Do you have any ideas how to do this effectively?

Read the article

Comparing (similar) images with Python/PIL

- by Attila Oláh

I'm trying to calculate the similarity (read: Levenshtein distance) of two images, using Python 2.6 and PIL. I plan to use the python-levenshtein library for fast comparison. Main question: What is a good strategy for comparing images? My idea is something like: Convert to RGB (transparent - white) (or maybe convert to monochrome?) Scale up the smaller one to the larger one's size Convert each channel (= the only channel, if converted to monochrome) to a sequence (item value = color value of the pixel) Calculate the Levenshtein distance between the two sequences Of course, this will not handle cases like mirrored images, cropped images, etc. But for basic comparison, this should be useful. Is there a better strategy documented somewhere?

Read the article

Price comparison sites and its effect on Google ranking

- by Jivago

I am the webmaster of a website that contains roughly 10,000 products. I would be possibly interested to index those products in a price comparison site like PriceGrabber, Nextag, Shopbot, etc. The principle of price comparison sites is great for an actual user that want to compare prices but my main concern is the effect it could have on my actual ranking on Google... Since a site like Shopbot uses a CPC model (Cost-per-click), all the links on the website are builted to track clicks (IE: http://www.shopbot.ca/r.html?i=3&catc=2&refshop=5706&refshopcodeid=42587349), it uses redirection, no direct links (So no direct backlinking). In your opinion and/or experience, is this a smart, business wise, seo wise move or not? THANKS!

Read the article

Advice On Price Comparison Affiliate Programs

- by pixelcook

I want a price comparison feature on my site similar to Consumer Reports' "Price & Shop" section. They use PriceGrabber.com, but as far as I can tell they have a special deal with CR, so I can't get a similar service for my site. I've gathered that I need to use an affiliate network, but the whole thing seems so shady, I don't really know what sites are legit, and I don't know what sites offer the price comparison feature. Datafeedfile.com comes up a lot during my searches, but the ugly site makes me wary. Does anyone have any experience with this? What affiliate networks do you recommend? Or should I be looking at something else altogether?

Read the article

What do you call "X <= $foo <= Y" comparison?

- by Jakob

While writing a Perl statement like if ( $foo >= X && $foo <= Y ) yet again, I wondered why many programming languages do not support the more comfortable form if ( X <= $foo <= Y ) and what this is called. I came up with "3-legged comparison" but no results when searching for it. By the way there is also the "element-of-set" form if ( $foo in X..Y ) which I only consider more readable when provided via a short keyword. Is there a term for X <= $foo <= Y comparison? Which languages support it?

Read the article

When to use identity comparison instead of equals?

- by maaartinus

I wonder why would anybody want to use identity comparison for fields in equals, like here (Java syntax): class C { private A a; public boolean equals(Object other) { // standard boring prelude if (other==this) return true; if (other==null) return false; if (other.getClass() != this.getClass()) return false; C c = (C) other; // the relevant part if (c.a != this.a) return false; // more tests... and then return true; } // getter, setters, hashCode, ... } Using == is a bit faster than equals and a bit shorter (due to no need for null tests), too, but in what cases (if any) you'd say it's really better to use == for fields inside equals?

Read the article

comparison of an unsigned variable to 0

- by user2651062

When I execute the following loop : unsigned m; for( m = 10; m >= 0; --m ){ printf("%d\n",m); } the loop doesn't stop at m==0, it keeps executing interminably, so I thought that reason was that an unsigned cannot be compared to 0. But when I did the following test unsigned m=9; if(m >= 0) printf("m is positive\n"); else printf("m is negative\n"); I got this result: m is positive which means that the unsigned variable m was successfully compared to 0. Why doesn't the comparison of m to 0 work in the for loop and works fine elsewhere?

Read the article

Custom graph comparison?

- by user57828

I'm trying to compare two graphs using hash value ( i.e, at the time of comparison, try to avoid traversing the graph ) Is there a way to make a function such that the hash values compared can also lead to determining at which height the graphs differ? The comparisons between two graphs are to be made by comparing children at a certain level. One way to compare the graphs is have a final hash value for the root node and compare them, but that wouldn't directly reflect at which level the graphs differ, since their immediate children might be the same ( or any other case ).

Read the article

Comparison of music data

- by Christian P.

Hey I am looking for theory, algorithms and similar for how to compare music. More specifically, I am looking into how to dupecheck music tracks that have different bitrates or perhaps slightly different variations (radio vs album version), but otherwise sound the same. Use cases for this include services such as Grooveshark, Youtube, etc. where they get a lot of duplicate tracks. I am also interested in text comparisons (Britney Spers vs Britney Spears, how far they deviate, etc.) although this is secondary and I already have some sources to go on in this area. I am mostly interested in codec-agnostic comparison techniques and algoritms (assuming a "raw" stream), but codec-specific resources are appreciated. I am aware of projects such as musicbrainz.org, but have not investigated it further, and would be interested if such projects could be of help in this endeavor.

Read the article

varchar comparison in SQL Server

- by Ram

I am looking for some SQL varchar comparison function like C# string.compare (we can ignore case for now, should return zero when the character expression are same and a non zero expression when they are different) Basically I have some alphanumeric column in one table which needs to be verified in another table. I cannot do select A.col1 - B.col1 from (query) as "-" operator cannot be applied on character expressions I cannot cast my expression as int (and then do a difference/subtraction) as it fails select cast ('ASXT000R' as int) Conversion failed when converting varchar 'ASXT000R' to int Soundex would not do it as soundex is same for 2 similar strings Difference would not do it as select difference('abc','ABC') = 4 (as per msdn, difference is the difference in the soundex of 2 character expressions and difference =4 implies least different) Is there any other way of doing it ?

Read the article

Numeric comparison difficulty in R

- by Matt Parker

I'm trying to compare two numbers in R as a part of a if-statement condition: (a-b) >= 0.5 In this particular instance, a = 0.58 and b = 0.08... and yet (a-b) >= 0.5 is false. I'm aware of the dangers of using == for exact number comparisons, and this seems related: (a - b) == 0.5) is false, while all.equal((a - b), 0.5) is true. The only solution I can think of is to have two conditions: (a-b) > 0.5 | all.equal((a-b), 0.5). This works, but is that really the only solution? Should I just swear off of the = family of comparison operators forever?

Read the article

Lucene Fuzzy Match on Phrase instead of Single Word

- by Koobz

I'm trying to do a fuzzy match on the Phrase "Grand Prarie" (deliberately misspelled) using Apache Lucene. Part of my issue is that the ~ operator only does fuzzy matches on single word terms and behaves as a proximity match for phrases. Is there a way to do a fuzzy match on a phrase with lucene?

Read the article

How does Comparison Sites work?

- by Vijay

Need your thinking on how does these Comparision Sites actually work. Sites like Junglee.com policybazaar.com and there are many like these which provides comaprision of products , fares etc. grabbed from different websites. I had read a little about it and what i found is-: These sites uses Feeds of the sites data. These sites uses APIs of the sites which are actually provided by those sites. And for some sites which do not have any of these two posibility then the Comparision sites uses web-crawler to crawl their data. This is what i have found out. If you think there is more things to it please do give your own views. But i want to know these for my learning purpose and a little for curiosity- how does they actually matches the crawled data , feeds, and other so that there is no duplicacy. What is the process or algorithms for it. And where should i go to learn these concepts. References for books , articles or anything else.

Read the article

Just another web startup - platform comparison

- by Holland

I'm looking to do a web startup which involves something along the lines of an ecommerce site, yet a little more in depth than that. While it's something that I would rather not go into detail with in terms of the initial idea, I can specify (on a basic level) what would be required of the website. If you have any observations or opinions derived from personal experience, which relate to what you see here, I'd appreciate it if you could share these. Paypal's API interaction (definitely). From what I've read about their API, integration with it into their website is VERY expensive, so I'd probably hold off on that until I've (hopefully) generated money and write my own simple credit-card interaction system. SQL Backend (obviously) PostgreSQL seems like a pretty good choice, as from what I've read, it's structure is a bit more "object-oriented" than, say, MySQL. Then again, I've used MySQL before and haven't had much problem with it whatsoever. Would it be worth learning PostgreSQL for this purpose? Java or .Net implementation (Preferably Mono, so I can use .Net while hosting the website using Apache). The reason for this is because, frankly, while I know PHP is a great platform to develop websites with, I hate developing with it. Before someone chimes in and flames me for saying that, note that I have nothing against the language, I just don't like it for my purposes. While Mono may be good to go with, I'm aware that ASP.Net MVC 3 hasn't been released for Mono yet, which may be a pain to work with, without their Razor syntax. Ontop of that, it seems Java is completely FULL of class libraries which deal with web development, that can be downloaded from the web. If anyone has any experience with these, I'd appreciate if that were posted. From what I've read about Spring and Struts2, they seem to be the best out there - especially since they're (AFAIK) MVC. I've considered Python and Django, which do seem REALLY nice, but I don't know much Python, and I'd rather start with something that I already know (language-wise; not framework-wise) than dive into learning a language AND a new framework. I'd REALLY like to be able to host my website via Apache, rather than using Windows Server or anything like that, as, frankly, I hate their setup. I'm not dissing it in any way, shape, or form, I'm just saying I dislike it. <3 terminal config. If there is a good reason to with Windows Server, however, I'd be willing to learn it. C# has a lot of things that Java appears not to have, including Delegates, unsigned types, and LINQ. Is there anything that Java has which can counter these?

Read the article

Comparison of languages by usage type?

- by Tom

Does anyone know of a good place to go find comparisons of programming languages by the intended platform/usage? Basically, what I want to know, is of the more popular languages, which ones are meant for high level application development, low level system development, mobile development, web, etc. If there's a good listing out there already, I'm not finding it so far. Does anyone know of a place that would have this? Thanks.

Read the article

Comparison of languages by usage type? [closed]

- by Tom

Does anyone know of a good place to go find comparisons of programming languages by the intended platform/usage? Basically, what I want to know, is of the more popular languages, which ones are meant for high level application development, low level system development, mobile development, web, etc. If there's a good listing out there already, I'm not finding it so far. Does anyone know of a place that would have this? Thanks.

Read the article

Django's makemessages creates a lot of fuzzy entries

- by jack

Each time I added some strings to a Django string, I run "django-admin.py makemessages -all" to generate .PO files for all locales. The problem is even I only added 5 news strings, the makemessages command will make 50 strings as fuzzy in .PO files which brings a lot of extra work for our locale maintainers. This also makes the entire i18n unusable before they manually revise those fuzzy strings.

Read the article

Fuzzy Search on Material Descriptions including numerical sizes & general descriptions of material t

- by Kyle

We're looking to provide a fuzzy search on an electrical materials database (i.e. conduit, cable, etc.). The problem is that, because of a lack of consistency across all material types, we could not split sizes into separate fields from the text description because some materials are rated by things other than size. I've attempted a combination of a full text search & a SQL CLR implementation of the Levenshtein search algorithm (for assistance in ranking), but my results are a little funky (i.e. they are not sorting correctly due to improper ranking). For example, if the search term is "3/4" ABCD Conduit", I'll might get back several irrelevant results in the following order: 1/2" Conduit 1/4" X 3/4" Cable 1/4" Cable Ties 3/4" DFC Conduit Tees 3/4" ABCD Conduit 3/4" Conduit I believe I've nailed the problem down to the fact that these two search algorithms do not factor in the relevance of punctuation & numeric. That is, in such a search, I'd expect the size to take precedence over any fuzzy match on the rest of the description, but my results don't reflect that. My question is: Can anyone recommend better search algorithms or different approaches that may be better suited for searching a combination of alphanumerics & punctuation characters?

Read the article

Algoritms for "fuzzy matching" strings

- by Alexey Romanov

By fuzzy matching I don't mean similar strings by Levenshtein distance or something similar, but the way it's used in TextMate/Ido/Icicles: given a list of strings, find those which include all characters in the search string, but possibly with other characters between, preferring the best fit.

Read the article

Fuzzy match two hash tables?

- by alex

Hi, I'm looking for ideas on how to best match two hash tables containing string key/value pairs. Here's the actual problem I'm facing: I have structured data coming in which is imported into the database. I need to UPDATE records which are already in the DB, however, it's possible that ANY value in the source can change, therefore I don't have a reliable ID. I'm thinking of fuzzy matching two rows, source and DB and make an "educated" guess if it should be updated and inserted. Any ideas would be greatly appreciated.

Read the article

C#/.NET Little Wonders: The Predicate, Comparison, and Converter Generic Delegates

- by James Michael Hare

Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. In the last three weeks, we examined the Action family of delegates (and delegates in general), the Func family of delegates, and the EventHandler family of delegates and how they can be used to support generic, reusable algorithms and classes. This week I will be completing my series on the generic delegates in the .NET Framework with a discussion of three more, somewhat less used, generic delegates: Predicate<T>, Comparison<T>, and Converter<TInput, TOutput>. These are older generic delegates that were introduced in .NET 2.0, mostly for use in the Array and List<T> classes. Though older, it’s good to have an understanding of them and their intended purpose. In addition, you can feel free to use them yourself, though obviously you can also use the equivalents from the Func family of delegates instead. Predicate<T> – delegate for determining matches The Predicate<T> delegate was a very early delegate developed in the .NET 2.0 Framework to determine if an item was a match for some condition in a List<T> or T[]. The methods that tend to use the Predicate<T> include: Find(), FindAll(), FindLast() Uses the Predicate<T> delegate to finds items, in a list/array of type T, that matches the given predicate. FindIndex(), FindLastIndex() Uses the Predicate<T> delegate to find the index of an item, of in a list/array of type T, that matches the given predicate. The signature of the Predicate<T> delegate (ignoring variance for the moment) is: 1: public delegate bool Predicate<T>(T obj); So, this is a delegate type that supports any method taking an item of type T and returning bool. In addition, there is a semantic understanding that this predicate is supposed to be examining the item supplied to see if it matches a given criteria. 1: // finds first even number (2) 2: var firstEven = Array.Find(numbers, n => (n % 2) == 0); 3: 4: // finds all odd numbers (1, 3, 5, 7, 9) 5: var allEvens = Array.FindAll(numbers, n => (n % 2) == 1); 6: 7: // find index of first multiple of 5 (4) 8: var firstFiveMultiplePos = Array.FindIndex(numbers, n => (n % 5) == 0); This delegate has typically been succeeded in LINQ by the more general Func family, so that Predicate<T> and Func<T, bool> are logically identical. Strictly speaking, though, they are different types, so a delegate reference of type Predicate<T> cannot be directly assigned to a delegate reference of type Func<T, bool>, though the same method can be assigned to both. 1: // SUCCESS: the same lambda can be assigned to either 2: Predicate<DateTime> isSameDayPred = dt => dt.Date == DateTime.Today; 3: Func<DateTime, bool> isSameDayFunc = dt => dt.Date == DateTime.Today; 4: 5: // ERROR: once they are assigned to a delegate type, they are strongly 6: // typed and cannot be directly assigned to other delegate types. 7: isSameDayPred = isSameDayFunc; When you assign a method to a delegate, all that is required is that the signature matches. This is why the same method can be assigned to either delegate type since their signatures are the same. However, once the method has been assigned to a delegate type, it is now a strongly-typed reference to that delegate type, and it cannot be assigned to a different delegate type (beyond the bounds of variance depending on Framework version, of course). Comparison<T> – delegate for determining order Just as the Predicate<T> generic delegate was birthed to give Array and List<T> the ability to perform type-safe matching, the Comparison<T> was birthed to give them the ability to perform type-safe ordering. The Comparison<T> is used in Array and List<T> for: Sort() A form of the Sort() method that takes a comparison delegate; this is an alternate way to custom sort a list/array from having to define custom IComparer<T> classes. The signature for the Comparison<T> delegate looks like (without variance): 1: public delegate int Comparison<T>(T lhs, T rhs); The goal of this delegate is to compare the left-hand-side to the right-hand-side and return a negative number if the lhs < rhs, zero if they are equal, and a positive number if the lhs > rhs. Generally speaking, null is considered to be the smallest value of any reference type, so null should always be less than non-null, and two null values should be considered equal. In most sort/ordering methods, you must specify an IComparer<T> if you want to do custom sorting/ordering. The Array and List<T> types, however, also allow for an alternative Comparison<T> delegate to be used instead, essentially, this lets you perform the custom sort without having to have the custom IComparer<T> class defined. It should be noted, however, that the LINQ OrderBy(), and ThenBy() family of methods do not support the Comparison<T> delegate (though one could easily add their own extension methods to create one, or create an IComparer() factory class that generates one from a Comparison<T>). So, given this delegate, we could use it to perform easy sorts on an Array or List<T> based on custom fields. Say for example we have a data class called Employee with some basic employee information: 1: public sealed class Employee 2: { 3: public string Name { get; set; } 4: public int Id { get; set; } 5: public double Salary { get; set; } 6: } And say we had a List<Employee> that contained data, such as: 1: var employees = new List<Employee> 2: { 3: new Employee { Name = "John Smith", Id = 2, Salary = 37000.0 }, 4: new Employee { Name = "Jane Doe", Id = 1, Salary = 57000.0 }, 5: new Employee { Name = "John Doe", Id = 5, Salary = 60000.0 }, 6: new Employee { Name = "Jane Smith", Id = 3, Salary = 59000.0 } 7: }; Now, using the Comparison<T> delegate form of Sort() on the List<Employee>, we can sort our list many ways: 1: // sort based on employee ID 2: employees.Sort((lhs, rhs) => Comparer<int>.Default.Compare(lhs.Id, rhs.Id)); 3: 4: // sort based on employee name 5: employees.Sort((lhs, rhs) => string.Compare(lhs.Name, rhs.Name)); 6: 7: // sort based on salary, descending (note switched lhs/rhs order for descending) 8: employees.Sort((lhs, rhs) => Comparer<double>.Default.Compare(rhs.Salary, lhs.Salary)); So again, you could use this older delegate, which has a lot of logical meaning to it’s name, or use a generic delegate such as Func<T, T, int> to implement the same sort of behavior. All this said, one of the reasons, in my opinion, that Comparison<T> isn’t used too often is that it tends to need complex lambdas, and the LINQ ability to order based on projections is much easier to use, though the Array and List<T> sorts tend to be more efficient if you want to perform in-place ordering. Converter<TInput, TOutput> – delegate to convert elements The Converter<TInput, TOutput> delegate is used by the Array and List<T> delegate to specify how to convert elements from an array/list of one type (TInput) to another type (TOutput). It is used in an array/list for: ConvertAll() Converts all elements from a List<TInput> / TInput[] to a new List<TOutput> / TOutput[]. The delegate signature for Converter<TInput, TOutput> is very straightforward (ignoring variance): 1: public delegate TOutput Converter<TInput, TOutput>(TInput input); So, this delegate’s job is to taken an input item (of type TInput) and convert it to a return result (of type TOutput). Again, this is logically equivalent to a newer Func delegate with a signature of Func<TInput, TOutput>. In fact, the latter is how the LINQ conversion methods are defined. So, we could use the ConvertAll() syntax to convert a List<T> or T[] to different types, such as: 1: // get a list of just employee IDs 2: var empIds = employees.ConvertAll(emp => emp.Id); 3: 4: // get a list of all emp salaries, as int instead of double: 5: var empSalaries = employees.ConvertAll(emp => (int)emp.Salary); Note that the expressions above are logically equivalent to using LINQ’s Select() method, which gives you a lot more power: 1: // get a list of just employee IDs 2: var empIds = employees.Select(emp => emp.Id).ToList(); 3: 4: // get a list of all emp salaries, as int instead of double: 5: var empSalaries = employees.Select(emp => (int)emp.Salary).ToList(); The only difference with using LINQ is that many of the methods (including Select()) are deferred execution, which means that often times they will not perform the conversion for an item until it is requested. This has both pros and cons in that you gain the benefit of not performing work until it is actually needed, but on the flip side if you want the results now, there is overhead in the behind-the-scenes work that support deferred execution (it’s supported by the yield return / yield break keywords in C# which define iterators that maintain current state information). In general, the new LINQ syntax is preferred, but the older Array and List<T> ConvertAll() methods are still around, as is the Converter<TInput, TOutput> delegate. Sidebar: Variance support update in .NET 4.0 Just like our descriptions of Func and Action, these three early generic delegates also support more variance in assignment as of .NET 4.0. Their new signatures are: 1: // comparison is contravariant on type being compared 2: public delegate int Comparison<in T>(T lhs, T rhs); 3: 4: // converter is contravariant on input and covariant on output 5: public delegate TOutput Contravariant<in TInput, out TOutput>(TInput input); 6: 7: // predicate is contravariant on input 8: public delegate bool Predicate<in T>(T obj); Thus these delegates can now be assigned to delegates allowing for contravariance (going to a more derived type) or covariance (going to a less derived type) based on whether the parameters are input or output, respectively. Summary Today, we wrapped up our generic delegates discussion by looking at three lesser-used delegates: Predicate<T>, Comparison<T>, and Converter<TInput, TOutput>. All three of these tend to be replaced by their more generic Func equivalents in LINQ, but that doesn’t mean you shouldn’t understand what they do or can’t use them for your own code, as they do contain semantic meanings in their names that sometimes get lost in the more generic Func name. Tweet Technorati Tags: C#,CSharp,.NET,Little Wonders,delegates,generics,Predicate,Converter,Comparison

Read the article

Blender has weird fuzzy misbehaving icon in launcher

- by TreefrogInc

I got Blender 2.58 (stable) from blender.org (downloaded the package, extracted it). Everything works fine, but the only thing that bugs me is that the icon in the launcher (when I open it) is really fuzzy. I guess that's no big deal, but when I select "Keep in Launcher" and close Blender, the icon doesn't work when I try to open it again. Then, I created a Launcher on my desktop with the path to the blender executable, slapped on a shiny new non-fuzzy .svg icon that came with the package, and dragged the (desktop) Launcher to my (Unity) Launcher. Now, when I click on the new launcher, it keeps showing the flashing thing that says it's opening the program, and then the old fuzzy icon comes back but doesn't replace the new icon. Even after Blender opens, the new icon is still flashing. What is wrong with my launcher?

Read the article

eReaderLookup Catalogs and Compares Over 100 eBook Readers

- by Jason Fitzpatrick

Although the Kindle and Nook get the most press time, there’s a world of ebook readers out there; eReaderLookup helps you search by price, size, screen type, storage, and other parameters to find the perfect ebook reader for your needs. Whether you’re trying to find a reader with an SD card slot, a large screen, or native support for an less-than-popular file format, you can plug it into eReaderLookup and see if a reader exists that fits your needs. If there is more than one reader that matches your search parameters you can easily compare them in a side-by-side setup to quickly compare the stats. Hit up the link below to take it for a spin. eReaderLookup [via MakeUseOf] How to Stress Test the Hard Drives in Your PC or Server How To Customize Your Android Lock Screen with WidgetLocker The Best Free Portable Apps for Your Flash Drive Toolkit

Read the article

Store comparison in variable (or execute comparison when it's given as an string)

- by BorrajaX

Hello everyone. I'd like to know if the super-powerful python allows to store a comparison in a variable or, if not, if it's possible calling/executing a comparison when given as an string ("==" or "!=") I want to allow the users of my program the chance of giving a comparison in an string. For instance, let's say I have a list of... "products" and the user wants to select the products whose manufacturer is "foo". He could would input something like: Product.manufacturer == "foo" and if the user wants the products whose manufacturer is not "bar" he would input Product.manufacturer != "bar" If the user inputs that line as an string, I create a tree with an structure like: != / \ manufacturer bar I'd like to allow that comparison to run properly, but I don't know how to make it happen if != is an string. The "manufacturer" field is a property, so I can properly get it from the Product class and store it (as a property) in the leaf, and well... "bar" is just an string. I'd like to know if I can something similar to what I do with "manufacturer": storing it with a 'callable" (kind of) thing: the property with the comparator: != I have tried with "eval" and it may work, but the comparisons are going to be actually used to query a MySQL database (using sqlalchemy) and I'm a bit concerned about the security of that... Any idea will be deeply appreciated. Thank you! PS: The idea of all this is being able to generate a sqlalchemy query, so if the user inputs the string: Product.manufacturer != "foo" || Product.manufacturer != "bar" ... my tree thing can generate the following: sqlalchemy.or_(Product.manufacturer !="foo", Product.manufacturer !="bar") Since sqlalchemy.or_ is callable, I can also store it in one of the leaves... I only see a problem with the "!="

Read the article

Create a unique ID by fuzzy matching of names (via agrep using R)

- by tbrambor

Using R, I am trying match on people's names in a dataset structured by year and city. Due to some spelling mistakes, exact matching is not possible, so I am trying to use agrep() to fuzzy match names. A sample chunk of the dataset is structured as follows: df <- data.frame(matrix( c("1200013","1200013","1200013","1200013","1200013","1200013","1200013","1200013", "1996","1996","1996","1996","2000","2000","2004","2004","AGUSTINHO FORTUNATO FILHO","ANTONIO PEREIRA NETO","FERNANDO JOSE DA COSTA","PAULO CEZAR FERREIRA DE ARAUJO","PAULO CESAR FERREIRA DE ARAUJO","SEBASTIAO BOCALOM RODRIGUES","JOAO DE ALMEIDA","PAULO CESAR FERREIRA DE ARAUJO"), ncol=3,dimnames=list(seq(1:8),c("citycode","year","candidate")) )) The neat version: citycode year candidate 1 1200013 1996 AGUSTINHO FORTUNATO FILHO 2 1200013 1996 ANTONIO PEREIRA NETO 3 1200013 1996 FERNANDO JOSE DA COSTA 4 1200013 1996 PAULO CEZAR FERREIRA DE ARAUJO 5 1200013 2000 PAULO CESAR FERREIRA DE ARAUJO 6 1200013 2000 SEBASTIAO BOCALOM RODRIGUES 7 1200013 2004 JOAO DE ALMEIDA 8 1200013 2004 PAULO CESAR FERREIRA DE ARAUJO I'd like to check in each city separately, whether there are candidates appearing in several years. E.g. in the example, PAULO CEZAR FERREIRA DE ARAUJO PAULO CESAR FERREIRA DE ARAUJO appears twice (with a spelling mistake). Each candidate across the entire data set should be assigned a unique numeric candidate ID. The dataset is fairly large (5500 cities, approx. 100K entries) so a somewhat efficient coding would be helpful. Any suggestions as to how to implement this?

Search Results

Search found 2242 results on 90 pages for 'fuzzy comparison'.

Page 1/90 | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Thomas Ahle

- by Attila Oláh

- by Jivago

- by pixelcook

- by Jakob

- by maaartinus

- by user2651062

- by user57828

- by Christian P.

- by Ram

- by Matt Parker

- by Koobz

- by Vijay

- by Holland

- by Tom

- by Tom

- by jack

- by Kyle

- by Alexey Romanov

- by alex

- by James Michael Hare

- by TreefrogInc

- by Jason Fitzpatrick

- by BorrajaX

- by tbrambor

1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >