String similarity detection
I was looking for a simple string similarity detection algorithm and I did find the one on CatalySoft. Since I like it and I found it useful, I did create a groovy equivalent of it.
Here is the code:
public class LetterPairSimilarity { /** @return an array of adjacent letter pairs contained in the input string */ private static def letterPairs(String str) { (0..str.size()-2).collect{str[it,it+1]} } /** @return an ArrayList of 2-character Strings. */ private static def wordLetterPairs(String str) { (str.split("\\s").collect{it} - [""]).collect{ letterPairs(it) }.flatten() } /** @return lexical similarity value in the range [0,1] */ public static double compareStrings(String str1, String str2) { if (!str1 || !str2 || str1=="" || str2=="") {return 0.0} def p1=wordLetterPairs(str1.toUpperCase()) def p2=wordLetterPairs(str2.toUpperCase()) 2*p1.intersect(p2).size() / (p1+p2).size } }
|
| Published on September 15th, 2008 | | Posted by Roman Mackovcak |

March 21st, 2010 at 22:59
Thanks for sharing!
Note: I tested it and noticed that ‘letterPairs’ sometimes raised a StringIndexOutOfBoundsException. I fixed it by adding the following at the beginning of the method:
if (str.size() < 2) return [str]
Hope this helps… bye
March 22nd, 2010 at 11:08
Correct! I did fix it in my code, but forgot the blog… thank you.