| Package | Description |
|---|---|
| eu.dnetlib.bioschemas.api.scraper |
| Modifier and Type | Method and Description |
|---|---|
CrawlRecord |
ScrapeState.getURLToProcess()
Returns the next URL/CrawlRecord to be scraped
|
| Modifier and Type | Method and Description |
|---|---|
List<CrawlRecord> |
ScrapeState.getPagesProcessed()
Gets the full list of URLs that have been processed in this cycle.
|
List<CrawlRecord> |
ScrapeState.getPagesProcessedAndUnprocessed()
Gets the full list of URLs/CrawlRecords regardless of whether scraped or not in the current cycle.
|
| Modifier and Type | Method and Description |
|---|---|
void |
ScrapeState.addFailedToScrapeURL(CrawlRecord record)
Adds the given CrawlRecord to the list of CrawlRecords NOT successfully scraped.
|
void |
ScrapeState.addSuccessfulScrapedURL(CrawlRecord record)
Adds the given CrawlRecord to the list of CrawlRecords successfully scraped.
|
void |
ScrapeState.setStatusTo404(CrawlRecord record)
Changes the status of the CrawlRecord to DOES_NOT_EXIST.
|
void |
ScrapeState.setStatusToHumanInspection(CrawlRecord record)
Changes the status of the CrawlRecord to HUMAN_INSPECTION.
|
| Constructor and Description |
|---|
ScrapeState(List<CrawlRecord> pagesToBeScraped) |
Copyright © 2025. All rights reserved.