Class DukeJaroWinklerAlgorithm

java.lang.Object
tk.airshipcraft.commonlib.utils.search.DukeJaroWinklerAlgorithm
All Implemented Interfaces:
StringDistance

public final class DukeJaroWinklerAlgorithm extends Object implements StringDistance

This class provides a highly optimized implementation of the Jaro-Winkler similarity algorithm, which is a measure of similarity between two byte arrays (and by extension, two strings). The main optimizations over standard implementations include calculating all metrics within a single loop and using byte arrays instead of Strings for performance gains.

Unlike some implementations, it doesn't perform a common character check because it tends to slow down the algorithm without significant accuracy benefit for the intended use-cases.

This implementation is adapted from Lars Marius Garshol's version, and it is designed to be used in high-performance scenarios where text similarity needs to be computed rapidly and at scale.

Since:
2023-04-11
Version:
1.0.0
See Also:
  • Constructor Details

    • DukeJaroWinklerAlgorithm

      public DukeJaroWinklerAlgorithm()
  • Method Details

    • calculate

      public double calculate(byte @NotNull [] x, byte @NotNull [] y)

      Computes the Jaro-Winkler similarity score between two byte arrays. The score is symmetrical and gives a value between 0 and 1, where 1 means an exact match and 0 means no similarity.

      The algorithm considers the number of matching characters and transpositions, adjusting for common prefixes up to a maximum of 4 characters.

      Specified by:
      calculate in interface StringDistance
      Parameters:
      x - The first byte array to compare.
      y - The second byte array to compare.
      Returns:
      A double value representing the similarity score between the two byte arrays.