Advice on String Similarity Metrics (Java). Distance, sounds like or combo?
- by andreas
Hello,
A part of a process requires to apply String Similarity Algorithms.
The results of this process will be stored and produce lets say SS_Dataset.
Based on this Dataset, further decisions will have to be made.
My questions are:
Should i apply one or more string similarity algorithms to produce SS_Dataset ?
Any comparisons between algorithms that calculate the 'distance' and the 'Sounds Like' similarity ?
Does one family of algorithms produces more accurate results over the other? Does a combination give more accurate results on similarity?
Can you recommend implementations that you have worked with?
My implementation will include packages from the following libraries
http://www.dcs.shef.ac.uk/~sam/simmetrics.html
http://jtmt.sourceforge.net/
Regards,