Complex string matching with fuzzywuzzy
- by That1Guy
I'm attempting to write a process that matches obscure strings to a single 'master string' for further processing. I have a lot of data that looks something like this:
Basketball
Basket Ball
Football
BasketBallR
BBall
BBall - r
FootB
...and so on. These need to be mapped to a master record like so:
Basketball = Basket Ball, BBall
Basketball - R = BasketBallR, BBall - r
I also have instances of data resembling this format:
Football -r
FootBall - r-g/H,Q,HH
These situations need to be separated into different categories before being mapped. For example FootBall - r-g/H,Q,HH should be:
Football - r
Football - g
Football - H
Football - Q
Football - HH
At this point, it still needs to be mapped to a master record...
I've tried several different combinations of fuzzywuzzy matching methods, Levenshtein Distance measurements, regex, etc. and can't seem to find a reliable method to logically associate different naming styles of a single item with a master name.
I'm throwing my hands up in desperation. Are there any existing python resources than can help sort out my problem? Are there other options? Can anybody point out an obvious option that I might have overlooked?
Basically, any suggestion, solution, resource or alternative method is greatly appreciated.