Search Results

Search found 3 results on 1 pages for 'phonetics'.

Page 1/1 | 1 

  • Please help fix and optimize this query

    - by user607217
    I am working on a system to find potential duplicates in our customers table (SQL 2005). I am using the built-in SOUNDEX value that our software computes when customers are added/updated, but I also implemented the double metaphone algorithm for better matching. This is the most-nested query I have created, and I can't help but think there is a better way to do it and I'd like to learn. In the inner-most query I am joining the customer table to the metaphone table I created, then finding customers that have identical pKey (primary phonetic key). I take that, union that with customers that have matching soundex values, and then proceed to score those matches with various text similarity functions. This is currently working, but I would also like to add a union of customers whose aKey (alternate phonetic key) match. This would be identical to "QUERY A" except to substitute on (c1Akey = c2Akey) for the join. However, when I attempt to include that, I get errors when I try to execute my query. Here is the code: --Create aggregate ranking select c1Name, c2Name, nDiff, c1Addr, c2Addr, aDiff, c1SSN, c2SSN, sDiff, c1DOB, c2DOB, dDiff, nDiff+aDiff+dDiff+sDiff as Score ,(sDiff+dDiff)*1.5 + (nDiff+dDiff)*1.5 + (nDiff+sDiff)*1.5 + aDiff *.5 + nDiff *.5 as [Rank] FROM ( --Create match scores for different fields SELECT c1Name, c2Name, c1Addr, c2Addr, c1SSN, c2SSN, c1LTD, c2LTD, c1DOB, c2DOB, dbo.Jaro(c1name, c2name) AS nDiff, dbo.JaroWinkler(c1addr, c2addr) AS aDiff, CASE WHEN c1dob = '1901-01-01' OR c2dob = '1901-01-01' OR c1dob = '1800-01-01' OR c2dob = '1800-01-01' THEN .5 ELSE dbo.SmithWaterman(c1dob, c2dob) END AS dDiff, CASE WHEN c1ssn = '000-00-0000' OR c2ssn = '000-00-0000' THEN .5 ELSE dbo.Jaro(c1ssn, c2ssn) END AS sDiff FROM -- Generate list of possible matches based on multiple phonetic matching fields ( select * from -- List of similar names from pKey field of ##Metaphone table --QUERY A BEGIN (select customers.custno as c1Custno, name as c1Name, haddr as c1Addr, ssn as c1SSN, lasttripdate as c1LTD, dob as c1DOB, soundex as c1Soundex, pkey as c1Pkey, akey as c1Akey from Customers WITH (nolock) join ##Metaphone on customers.custno = ##Metaphone.custno) as c1 JOIN (select customers.custno as c2Custno, name as c2Name, haddr as c2Addr, ssn as c2SSN, lasttripdate as c2LTD, dob as c2DOB, soundex as c2Soundex, pkey as c2Pkey, akey as c2Akey from Customers with (nolock) join ##Metaphone on customers.custno = ##Metaphone.custno) as c2 on (c1Pkey = c2Pkey) and (c1Custno < c2Custno) WHERE (c1Name <> 'PARENT, GUARDIAN') and c1soundex != c2soundex --QUERY A END union --List of similar names from pregenerated SOUNDEX field (select t1.custno, t1.name, t1.haddr, t1.ssn, t1.lasttripdate, t1.dob, t1.[soundex], 0, 0, t2.custno, t2.name, t2.haddr, t2.ssn, t2.lasttripdate, t2.dob, t2.[soundex], 0, 0 from Customers t1 WITH (nolock) join customers t2 with (nolock) on t1.[soundex] = t2.[soundex] and t1.custno < t2.custno where (t1.name <> 'PARENT, GUARDIAN')) ) as a ) as b where (sDiff+dDiff)*1.5 + (nDiff+dDiff)*1.5 + (nDiff+sDiff)*1.5 + aDiff *.5 + nDiff *.5 >= 7.5 order by [rank] desc, score desc Previously, I was using joins such as on c1.pkey = c2.pkey or c1.akey = c2.akey or c1.soundex = c2.soundex but the performance was horrendous, and using unions seems to be working a lot better. Out of 103K customers, tt is currently generating a list of 8.5M potential matches (based on the phonetic codes) in 2.25 minutes, and then taking another 2 to score, rank and filter those down to about 3000. So I am happy with the performance, I just can't help but think there is a better way to structure this, and I need help adding the extra union condition. Thanks!

    Read the article

  • How do you pronounce Linux?

    - by Xerxes
    I'm tired of the old fart at work who keeps coming upto my desk and telling me all about his "years of experience in working with Unix and Lye-nix". I couldn't vent it out at him because that would be wrong, so I'm going to vent it out here - because obviously (that's the right thing to do...). Anyway, for all the people that practice in this disgusting behaviour - the pronunciation is.... (Hmmm - anyone know phonetics?) - "Li-nix" Note: Despite hating him for this - he is otherwise a very nice (but sometimes rather annoying) person. Now... to formally make this a "question" - Could someone write the phonetics for pronouncing "Linux", and also the notorious "Lye-nix", so I can make a note of it for future ventings? I think this is right... L?n?x, NOT L?n?x. ...or perhaps... L?n?x, NOT L?n?x* Can someone confirm the correct phonetics? (Listen to Linus on the matter).

    Read the article

  • What are the common patterns in web programming?

    - by lankerisms
    I have been trying to write my first big web app (more than one cgi file) and as I kept moving forward with the rough prototype, paralelly trying to predict more tasks, this is the todo that got accumulated (In no particular order). * Validations and input sanitizations * Object versioning (to avoid edit conflicts. I dont want hard locks) * Exception handling * memcache * xss and injection protections * javascript * html * ACLs * phonetics in search, match and find duplicates (for form validation) * Ajaxify!!! (I have snipped off the project specific items.) I know that each todo will be quite tied up to its project and technologies used. What I am wondering though, is if there is a pattern in your todo items as well as the sequence in which you experienced guys have come across them.

    Read the article

1