String similarity detection

I was looking for a simple string similarity detection algorithm and I did find the one on CatalySoft. Since I like it and I found it useful, I did create a groovy equivalent of it.

Here is the code:

public class LetterPairSimilarity {
 
  /** @return an array of adjacent letter pairs contained in the input string */
  private static def letterPairs(String str) {
    (0..str.size()-2).collect{str[it,it+1]} 
  }
 
  /** @return an ArrayList of 2-character Strings. */
  private static def wordLetterPairs(String str) {
    (str.split("\\s").collect{it} - [""]).collect{ letterPairs(it) }.flatten()
  }
 
  /** @return lexical similarity value in the range [0,1] */
  public static double compareStrings(String str1, String str2) {
    if (!str1 || !str2 || str1=="" || str2=="") {return 0.0}
    def p1=wordLetterPairs(str1.toUpperCase())
    def p2=wordLetterPairs(str2.toUpperCase())
 
    2*p1.intersect(p2).size() / (p1+p2).size
  }
}

2 Responses to “String similarity detection”

  1. Marco T. Says:

    Thanks for sharing! :-D
    Note: I tested it and noticed that ‘letterPairs’ sometimes raised a StringIndexOutOfBoundsException. I fixed it by adding the following at the beginning of the method:

    if (str.size() < 2) return [str]

    Hope this helps… bye

  2. Roman Mackovcak Says:

    Correct! I did fix it in my code, but forgot the blog… thank you.