Advice on String Similarity Metrics (Java). Distance, sounds like or combo?

Posted by andreas on Stack Overflow See other posts from Stack Overflow or by andreas
Published on 2010-04-21T13:01:48Z Indexed on 2010/04/21 23:03 UTC
Read the original article Hit count: 476

Hello,

A part of a process requires to apply String Similarity Algorithms.

The results of this process will be stored and produce lets say SS_Dataset.

Based on this Dataset, further decisions will have to be made.

My questions are:

  • Should i apply one or more string similarity algorithms to produce SS_Dataset ?

  • Any comparisons between algorithms that calculate the 'distance' and the 'Sounds Like' similarity ?

Does one family of algorithms produces more accurate results over the other? Does a combination give more accurate results on similarity?

  • Can you recommend implementations that you have worked with?

My implementation will include packages from the following libraries

http://www.dcs.shef.ac.uk/~sam/simmetrics.html

http://jtmt.sourceforge.net/

Regards,

© Stack Overflow or respective owner

Related posts about string

Related posts about similarity