Package tk.airshipcraft.commonlib.utils.search
package tk.airshipcraft.commonlib.utils.search
The tk.airshipcraft.commonlib.utils.search
package contains implementations
of various string search and distance algorithms. These are essential tools for performing
text analysis, enabling features like search optimization, string comparison, and pattern matching.
Classes and Interfaces:
DukeJaroWinklerAlgorithm
: An optimized implementation of the Jaro-Winkler similarity algorithm.NormalizedLevenshteinAlgorithm
: An implementation of the Levenshtein distance algorithm, normalized to a 0-1 scale.SearchAlgorithm
: An enumeration that provides a selection of different search algorithms for easy usage.StringDistance
: An interface defining a method for calculating distances or similarities between strings.Trie
: A data structure for efficient string retrieval and pattern matching.
Usage Examples:
DukeJaroWinklerAlgorithm:
StringDistance jaroWinkler = new DukeJaroWinklerAlgorithm();
double similarity = jaroWinkler.calculate("string1".getBytes(), "string2".getBytes());
System.out.println("Similarity: " + similarity);
NormalizedLevenshteinAlgorithm:
StringDistance levenshtein = new NormalizedLevenshteinAlgorithm();
double distance = levenshtein.calculate("string1".getBytes(), "string2".getBytes());
System.out.println("Distance: " + distance);
SearchAlgorithm:
double dukeScore = SearchAlgorithm.DUKE_JARO_WINKLER.calculate("string1".getBytes(), "string2".getBytes());
System.out.println("Duke Jaro-Winkler Score: " + dukeScore);
Trie:
Trie trie = Trie.getNewTrie();
trie.insert("hello");
trie.insert("world");
List<String> matches = trie.match("he");
System.out.println("Matches: " + matches);
Each class and interface is designed to be interoperable where suitable, and can be integrated into larger systems requiring text search and processing capabilities.
- Since:
- 1.0
-
ClassDescriptionThis class provides a highly optimized implementation of the Jaro-Winkler similarity algorithm, which is a measure of similarity between two byte arrays (and by extension, two strings).Implements the normalized Levenshtein distance algorithm, which calculates a similarity score based on the minimum number of single-character edits required to change one byte array into the other.Enumerates available string distance algorithms which can be used to compare similarity or dissimilarity between two strings.An interface for string distance algorithms that calculate the difference between two byte arrays.Implements a trie (prefix tree) data structure that provides fast retrieval of strings based on their prefixes.