I was looking for a simple string similarity detection algorithm and I did find the one on CatalySoft. Since I like it and I found it useful, I did create a groovy equivalent of it.
Here is the code:
public class LetterPairSimilarity {
/** @return an array of adjacent letter pairs contained in the input string */
private static def letterPairs(String str) {
(0..str.size()-2).collect{str[it,it+1]}
}
/** @return an ArrayList of 2-character Strings. */
private static def wordLetterPairs(String str) {
(str.split("\\s").collect{it} - [""]).collect{ letterPairs(it) }.flatten()
}
/** @return lexical similarity value in the range [0,1] */
public static double compareStrings(String str1, String str2) {
if (!str1 || !str2 || str1=="" || str2=="") {return 0.0}
def p1=wordLetterPairs(str1.toUpperCase())
def p2=wordLetterPairs(str2.toUpperCase())
2*p1.intersect(p2).size() / (p1+p2).size
}
}
Thanks for sharing! 😀
Note: I tested it and noticed that ‘letterPairs’ sometimes raised a StringIndexOutOfBoundsException. I fixed it by adding the following at the beginning of the method:
if (str.size() < 2) return [str]
Hope this helps… bye
Correct! I did fix it in my code, but forgot the blog… thank you.