String similarity detection
I was looking for a simple string similarity detection algorithm and I did find the one on CatalySoft. Since I like it and I found it useful, I did create a groovy equivalent of it.
Here is the code:
public class LetterPairSimilarity { /** @return an array of adjacent letter pairs contained in the input string */ private static def letterPairs(String str) { (0..str.size()-2).collect{str[it,it+1]} } /** @return an ArrayList of 2-character Strings. */ private static def wordLetterPairs(String str) { (str.split("\\s").collect{it} - [""]).collect{ letterPairs(it) }.flatten() } /** @return lexical similarity value in the range [0,1] */ public static double compareStrings(String str1, String str2) { if (!str1 || !str2 || str1=="" || str2=="") {return 0.0} def p1=wordLetterPairs(str1.toUpperCase()) def p2=wordLetterPairs(str2.toUpperCase()) 2*p1.intersect(p2).size() / (p1+p2).size } }
![]() | Published on September 15th, 2008 | | 2 Comments | | Posted by Roman Mackovcak |