Class DBLPResultAnalyzer

java.lang.Object
eu.openaire.dblp_benchmark.DBLPResultAnalyzer
All Implemented Interfaces:
Serializable

public class DBLPResultAnalyzer extends Object implements Serializable
Analyzes false positives and false negatives produced by OpenAIRE (AIDER) name matching.

Input: the joined ORCID/DBLP parquet cache produced by DBLPBenchmark (the path <output>_joined_cache from that job).

Output (JSON): one record per erroneous classification:

  • FP (false positive): OpenAIRE matched a DBLP author to an ORCID author whose ORCID does not match the ground-truth ORCID stored in the DBLP record.
  • FN (false negative): OpenAIRE failed to match a DBLP author that has a ground-truth ORCID, but only when ORCID-ID matching would have succeeded (i.e. the correct author IS present in the ORCID dataset for that DOI). FNs where the correct author is absent from the ORCID dataset are excluded because those are data-coverage gaps, not algorithm failures.

Usage:


 spark-submit --class eu.openaire.dblp_enricher.DBLPFalsePositiveAnalyzer \
   target/dblp-orcid-benchmark-0.1.1.jar \
   --cachePath <output>_joined_cache \
   --output <output-path>
 
See Also:
  • Constructor Details

    • DBLPResultAnalyzer

      public DBLPResultAnalyzer()
  • Method Details

    • main

      public static void main(String[] args)