Class DukeJaroWinklerAlgorithm
- All Implemented Interfaces:
StringDistance
This class provides a highly optimized implementation of the Jaro-Winkler similarity algorithm, which is a measure of similarity between two byte arrays (and by extension, two strings). The main optimizations over standard implementations include calculating all metrics within a single loop and using byte arrays instead of Strings for performance gains.
Unlike some implementations, it doesn't perform a common character check because it tends to slow down the algorithm without significant accuracy benefit for the intended use-cases.
This implementation is adapted from Lars Marius Garshol's version, and it is designed to be used in high-performance scenarios where text similarity needs to be computed rapidly and at scale.
- Since:
- 2023-04-11
- Version:
- 1.0.0
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiondouble
calculate
(byte @NotNull [] x, byte @NotNull [] y) Computes the Jaro-Winkler similarity score between two byte arrays.
-
Constructor Details
-
DukeJaroWinklerAlgorithm
public DukeJaroWinklerAlgorithm()
-
-
Method Details
-
calculate
public double calculate(byte @NotNull [] x, byte @NotNull [] y) Computes the Jaro-Winkler similarity score between two byte arrays. The score is symmetrical and gives a value between 0 and 1, where 1 means an exact match and 0 means no similarity.
The algorithm considers the number of matching characters and transpositions, adjusting for common prefixes up to a maximum of 4 characters.
- Specified by:
calculate
in interfaceStringDistance
- Parameters:
x
- The first byte array to compare.y
- The second byte array to compare.- Returns:
- A double value representing the similarity score between the two byte arrays.
-