Class SingleFileEntitiesHarvester
- java.lang.Object
-
- de.pangaea.metadataportal.harvester.Harvester
-
- de.pangaea.metadataportal.harvester.SingleFileEntitiesHarvester
-
- Direct Known Subclasses:
DirectoryHarvester
,ElasticsearchHarvester
,PanFMP1IndexHarvester
,PushWrapperHarvester
,WebCrawlingHarvester
,ZipFileHarvester
public abstract class SingleFileEntitiesHarvester extends Harvester
Abstract harvester class for single file entities (like files from web page or from a local directory). The harvester makes it possible to add XML documents given by aSource
to the index. These are harvested, but if an fatal parse error occurs, the harvester will then stop harvesting (like it would be with OAI-PMH), ignore the document, or delete it (if existent in index) depending on the harvester property "parseErrorAction".This panFMP harvester supports the following harvester properties in adidition to the default ones:
parseErrorAction
: What to do if a parse error occurs? Can beSTOP
,IGNOREDOCUMENT
,DELETEDOCUMENT
(default is to ignore the document)deleteMissingDocuments
: remove documents after harvesting that were deleted from source (maybe a heavy operation). (default: true)
- Author:
- Uwe Schindler
-
-
Field Summary
-
Fields inherited from class de.pangaea.metadataportal.harvester.Harvester
fromDateReference, harvestCount, HARVESTER_METADATA_FIELD_LAST_HARVESTED, harvestMessageStep, iconfig, log, processor
-
-
Constructor Summary
Constructors Modifier Constructor Description SingleFileEntitiesHarvester(HarvesterConfig iconfig)
protected
SingleFileEntitiesHarvester(HarvesterConfig iconfig, DocumentErrorAction parseErrorAction)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
addDocument(String identifier, long lastModified, Source xml)
Adds a document to theHarvester.processor
working in the background.protected void
addDocument(String identifier, Instant lastModified, Source xml)
Adds a document to theHarvester.processor
working in the background.protected void
cancelMissingDocumentDelete()
disable the property "deleteMissingDocuments" for this instance.void
close(boolean cleanShutdown)
Closes harvester.protected void
enumerateValidHarvesterPropertyNames(Set<String> props)
This method is used by subclasses to enumerate all available harvester properties that are implemented by them.-
Methods inherited from class de.pangaea.metadataportal.harvester.Harvester
addDocument, createMetadataDocumentInstance, deleteDocument, finishReindex, getValidHarvesterPropertyNames, harvest, isAllIndexes, isClosed, isDocumentOutdated, main, open, prepareReindex, runHarvester, runHarvester, setHarvestingDateReference, setValidIdentifiers
-
-
-
-
Constructor Detail
-
SingleFileEntitiesHarvester
public SingleFileEntitiesHarvester(HarvesterConfig iconfig)
-
SingleFileEntitiesHarvester
protected SingleFileEntitiesHarvester(HarvesterConfig iconfig, DocumentErrorAction parseErrorAction)
-
-
Method Detail
-
close
public void close(boolean cleanShutdown) throws Exception
Description copied from class:Harvester
Closes harvester. All resources are freed and theHarvester.processor
is closed.- Overrides:
close
in classHarvester
- Parameters:
cleanShutdown
- enables writing of status information to the Elasticsearch instance for the next harvesting. If an error occurred during harvesting this should not be done.- Throws:
Exception
- if an exception occurs during closing (various types of exceptions can be thrown). Exceptions can be thrown asynchronous and may not affect the correct document.
-
addDocument
protected final void addDocument(String identifier, long lastModified, Source xml) throws Exception
Adds a document to theHarvester.processor
working in the background. If a parsing error occurs the document is handled according toparseErrorAction
. It is also added to the valid identifiers (if unseen documents should be deleted).- Parameters:
identifier
- is the document's identifier in the indexlastModified
- is the last-modification date which is used to calculate the next harvesting start date. If document is older that the last harvesting, it is skipped.xml
- is the transformer source of the document,null
to only update document status (lastModified) and adding to valid identifiers- Throws:
Exception
- See Also:
Harvester.addDocument(MetadataDocument)
-
addDocument
protected void addDocument(String identifier, Instant lastModified, Source xml) throws Exception
Adds a document to theHarvester.processor
working in the background.- Throws:
Exception
- See Also:
addDocument(String,Instant,Source)
-
cancelMissingDocumentDelete
protected void cancelMissingDocumentDelete()
disable the property "deleteMissingDocuments" for this instance. This can be used, when the container (like a ZIP file was not modified), and all containing documents are not enumerated. To prevent deletion of all these documents call this.
-
enumerateValidHarvesterPropertyNames
protected void enumerateValidHarvesterPropertyNames(Set<String> props)
Description copied from class:Harvester
This method is used by subclasses to enumerate all available harvester properties that are implemented by them. Overwrite this method in your own implementation and append all harvester names to the suppliedSet
. The public API for client code requesting property names isHarvester.getValidHarvesterPropertyNames()
.- Overrides:
enumerateValidHarvesterPropertyNames
in classHarvester
- See Also:
Harvester.getValidHarvesterPropertyNames()
-
-