public class OrderedTokenAndAbbreviationsMatcher extends Object
This class provides methods to tokenize author names and compare them based on abbreviation recognition. It is useful for identifying name variations where part of the full name might be reordered or abbreviated.
| Modifier and Type | Field and Description |
|---|---|
static int |
NUM_TOKEN_MAX_DIFF
Maximum allowed difference in the number of tokens between two names for them to be comparable.
|
static Pattern |
SPLIT_REGEX
Regular expression pattern used to split names into tokens.
|
| Constructor and Description |
|---|
OrderedTokenAndAbbreviationsMatcher() |
| Modifier and Type | Method and Description |
|---|---|
static Optional<Double> |
compare(String a1,
String a2)
Compares two author names using token-based matching and abbreviation handling.
|
public static final Pattern SPLIT_REGEX
The pattern matches spaces, punctuation symbols, and dashes, ensuring that names are split into meaningful components.
public static int NUM_TOKEN_MAX_DIFF
public static Optional<Double> compare(String a1, String a2)
The comparison follows these rules:
NUM_TOKEN_MAX_DIFF.The method returns an Optional containing a confidence score if a match is found,
or an empty Optional if no match is identified.
a1 - The first author name.a2 - The second author name.Optional<Double> with a confidence score (1.0 if a match is found), or empty if no match.Copyright © 2026. All Rights Reserved.