String similarity detection

I was looking for a simple string similarity detection algorithm and I did find the one on CatalySoft. Since I like it and I found it useful, I did create a groovy equivalent of it.

Here is the code:

public class LetterPairSimilarity {

  /** @return an array of adjacent letter pairs contained in the input string */
  private static def letterPairs(String str) {
    (0..str.size()-2).collect{str[it,it+1]} 
  }

  /** @return an ArrayList of 2-character Strings. */
  private static def wordLetterPairs(String str) {
    (str.split("\\s").collect{it} - [""]).collect{ letterPairs(it) }.flatten()
  }

  /** @return lexical similarity value in the range [0,1] */
  public static double compareStrings(String str1, String str2) {
    if (!str1 || !str2 || str1=="" || str2=="") {return 0.0}
    def p1=wordLetterPairs(str1.toUpperCase())
    def p2=wordLetterPairs(str2.toUpperCase())

    2*p1.intersect(p2).size() / (p1+p2).size
  }
}

2 comments

  1. Thanks for sharing! 😀
    Note: I tested it and noticed that ‘letterPairs’ sometimes raised a StringIndexOutOfBoundsException. I fixed it by adding the following at the beginning of the method:

    if (str.size() < 2) return [str]

    Hope this helps… bye

Comments are closed.