Could someone give me their two cents on this optimization strategy
- by jimstandard
Background: I am writing a matching script in python that will match records of a transaction in one database to names of customers in another database. The complexity is that names are not unique and can be represented multiple different ways from transaction to transaction.
Rather than doing multiple queries on the database (which is pretty slow) would it be faster to get all of the records where the last name (which in this case we will say never changes) is "Smith" and then have all of those records loaded into memory as you go though each looking for matches for a specific "John Smith" using various data points.
Would this be faster, is it feasible in python, and if so does anyone have any recommendations for how to do it?