Class CitationCountByYearMapperUtil

java.lang.Object
eu.dnetlib.dhp.schema.solr.CitationCountByYearMapperUtil

public class CitationCountByYearMapperUtil extends Object
Example mapper utilities for populating citationCountByYear on Result objects. This class demonstrates the integration pattern for mapper/importer modules that consume citation timeline data from upstream sources and populate the Result model. Usage in a real mapper (pseudo-code):
 // In dhp-hadoop or similar mapper module:
 public Result mapCitation(UpstreamCitationRecord record, Result result) {
     List<CitationCountByYear> raw = extractCitationsByYear(record);
     List<CitationCountByYear> canonical = CitationCountByYearMapperUtil.populateCitations(raw);
     result.setCitationCountByYear(canonical);
     return result;
 }

 private List<CitationCountByYear> extractCitationsByYear(UpstreamCitationRecord record) {
     // Parse from JSON, Avro, or database
     return record.getCitations().stream()
         .map(c -> CitationCountByYear.newInstance(c.getYear(), c.getCount()))
         .collect(Collectors.toList());
 }
 
  • Constructor Details

    • CitationCountByYearMapperUtil

      public CitationCountByYearMapperUtil()
  • Method Details

    • populateCitations

      public static List<CitationCountByYear> populateCitations(List<CitationCountByYear> rawCitations)
      Populates and canonicalizes a citation-by-year list for use in a Result object. This is the primary integration point for mappers: - Normalizes input (may be null, unsorted, contain duplicates, invalid entries) - Returns a canonical list ready for indexing - Logs/tracks invalid entries for monitoring (optional)
      Parameters:
      rawCitations - Raw citation entries from upstream source.
      Returns:
      Canonicalized citation list suitable for Result.setCitationCountByYear().
    • populateCitationsWithTracking

      public static List<CitationCountByYear> populateCitationsWithTracking(List<CitationCountByYear> rawCitations, CitationCountByYearMapperUtil.InvalidEntryHandler invalidEntryHandler)
      Alternative: if you want to track/log invalid entries.
      Parameters:
      rawCitations - Raw citation entries from upstream source.
      invalidEntryHandler - Optional callback to handle invalid entries (e.g., for metrics/logging).
      Returns:
      Canonicalized citation list suitable for Result.setCitationCountByYear().
    • validateResult

      public static boolean validateResult(Result result)
      Validates that a Result object has well-formed citation data (if present). Can be used in post-mapping validation or in result processors.
      Parameters:
      result - Result to validate.
      Returns:
      true if citationCountByYear is null/empty or canonical; false if malformed.