public class ServiceScraper
extends hwu.elixir.scrape.scraper.ScraperFilteredCore
ScraperFilteredCore| Constructor and Description |
|---|
ServiceScraper() |
| Modifier and Type | Method and Description |
|---|---|
boolean |
scrape(String url,
Long contextCounter,
String outputFolderName,
String fileName,
StatusOfScrape status)
Orchestrates the process of scraping a site before converting the extracted
triples to NQuads and writing to a file.
|
protected String |
wrapHTMLExtraction(String url) |
fixAJsonLdArray, fixAllJsonLdBlocks, fixASingleJsonLdBlock, fixASingleJsonLdObject, fixASingleJSONLdObject, injectId, iriGenerator, processTriples, scrape, swapJsonLdMarkupcreateModelFromNTriples, displayResult, fixAny23WeirdIssues, fixObject, fixPredicate, fixURL, getHtml, getHtmlViaSelenium, getOnlyUnfilteredJSONLDFromHtml, getTripelsInNTriplesRDF4J, getTriplesInJSONLD, getTriplesInNTriples, shutdown, wrapHTMLExtractionStaticpublic boolean scrape(String url, Long contextCounter, String outputFolderName, String fileName, StatusOfScrape status) throws hwu.elixir.scrape.exceptions.FourZeroFourException, hwu.elixir.scrape.exceptions.JsonLDInspectionException, hwu.elixir.scrape.exceptions.CannotWriteException, hwu.elixir.scrape.exceptions.MissingMarkupException
url - Site to be scrapedcontextCounter - Number used to generate the named graph/context and
the URLs used to replace blank nodes.outputFolderName - Location to which the NQuads will be writtenhwu.elixir.scrape.exceptions.FourZeroFourExceptionhwu.elixir.scrape.exceptions.JsonLDInspectionExceptionhwu.elixir.scrape.exceptions.CannotWriteExceptionhwu.elixir.scrape.exceptions.MissingMarkupExceptionCopyright © 2025. All rights reserved.