How to do this Python / MySQL manipulation (match) more efficiently?
- by NJTechie
Following is my data :
Company Table :
ID Company Address City State Zip Phone
1 ABC 123 Oak St Philly PA 17542 7329878901
2 CDE 111 Joe St Newark NJ 08654
3 GHI 211 Foe St Brick NJ 07740 7321178901
4 JAK 777 Wall Ocean NJ 07764 7322278901
5 KLE 87 Ilk St Plains NY 07654 7376578901
6 AB 1 W.House SField PA 87656 7329878901
Branch Office Table :
ID Address City State Zip Phone
1 323 Alk St Philly PA 17542 7329832221
1 171 Joe St Newark NJ 08654
3 287 Foe St Brick NJ 07740 7321178901
3 700 Wall Ocean NJ 07764 7322278901
1 89 Blk St Surrey NY 07154 7376222901
File to be Matched (In MySQL):
ID Company Address City State Zip Phone
1 ABC 123 Oak St Philly PA 17542 7329878901
2 AB 171 Joe St Newark NJ 08654
3 GHI 211 Foe St Brick NJ 07740 7321178901
4 JAK 777 Wall Ocean NJ 07764 7322278901
5 K 87 Ilk St Plains NY 07654 7376578901
Resulting File :
ID Company Address City State Zip Phone appendedID
1 ABC 123 Oak St Philly PA 17542 7329878901 [Original record, field always empty]
1 ABC 171 Joe St Newark NJ 08654 1 [Company Table]
1 ABC 323 Alk St Philly PA 17542 7329832221 1 [Branch Office Table]
1 AB 1 W.House SField PA 87656 7329878901 6 [Partial firm and State, Zip match]
2 CDE 111 Joe St Newark NJ 08654
3 GHI 211 Foe St Brick NJ 07740 7321178901
3 GHI 700 Wall Ocean NJ 07764 7322278901 3
3 GHI 287 Foe St Brick NJ 07740 7321178901 3
4 JAK 777 Wall Ocean NJ 07764 7322278901
5 KLE 87 Ilk St Surrey NY 07654 7376578901
5 KLE 89 Blk St Surrey NY 07154 7376222901 5
Requirement :
1) I have to match each firm on the 'File to be Matched' to that of Company and Branch Office tables (MySQL).
2) If there are multiple exact/partial matches, then the ID from Company, Branch Office table is inserted as a new row in the resulting file.
3) Not all the firms will be matched perfectly, in that case I have to match on partial Company names (like 5/8th of the company name) and any of the address fields and insert them in the resulting file.
Please help me out in the most efficient solution for this problem.