Disambiguating Named Entities in Java
- by Alterscape
I have a list of strings (company names, in this case), and a Java program that extracts a list of things that look like company names out of mostly-unstructured text. I need to match each element of extracted text to a string in the list. Caveat: the unstructured text has typos, things like "Blah, Inc." referred to as "Blah," etc. I've tried Levenshtein Edit Distance, but that fails for predictable reasons. Are there known best-practices ways of tackling this problem? Or am I back to manual data-entry?