Class ZipFileHarvester
- java.lang.Object
-
- de.pangaea.metadataportal.harvester.Harvester
-
- de.pangaea.metadataportal.harvester.SingleFileEntitiesHarvester
-
- de.pangaea.metadataportal.harvester.ZipFileHarvester
-
public class ZipFileHarvester extends SingleFileEntitiesHarvester
Harvester for unzipping ZIP files and reading their contents. Identifiers look like: "zip:<identifierPrefix><entryFilename>"This harvester supports the following additional harvester properties:
zipFile
: filename or URL of ZIP file to harvestidentifierPrefix
: This prefix is appended before all identifiers (that are the identifiers of the documents) (default: "")filenameFilter
: regex to match the entry filename (default: none)useZipFileDate
: if "yes", check the modification date of the ZIP file and re-harvest in complete; if "no", look at each file in the archive and store its modification date in index. For ZIP files from network connections that seldom change use "yes" as it prevents scanning the ZIP file in complete. "No" is recommended for large local files with much modifications in only some files (default: yes)retryCount
: how often retry on HTTP errors? (default: 5)retryAfterSeconds
: time between retries in seconds (default: 60)timeoutAfterSeconds
: HTTP Timeout for harvesting in seconds
- Author:
- Uwe Schindler
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_RETRY_COUNT
static int
DEFAULT_RETRY_TIME
static int
DEFAULT_TIMEOUT
protected int
retryCount
the retryCount from configurationprotected int
retryTime
the retryTime from configurationprotected int
timeout
the timeout from configurationstatic String
USER_AGENT
-
Fields inherited from class de.pangaea.metadataportal.harvester.Harvester
fromDateReference, harvestCount, HARVESTER_METADATA_FIELD_LAST_HARVESTED, harvestMessageStep, iconfig, log, processor
-
-
Constructor Summary
Constructors Constructor Description ZipFileHarvester(HarvesterConfig iconfig)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
enumerateValidHarvesterPropertyNames(Set<String> props)
This method is used by subclasses to enumerate all available harvester properties that are implemented by them.void
harvest()
This method is called by the harvester afterHarvester.open(de.pangaea.metadataportal.processor.ElasticsearchConnection, java.lang.String)
'ing it.void
open(ElasticsearchConnection es, String targetIndex)
Opens harvester for harvesting documents described by the givenHarvesterConfig
.-
Methods inherited from class de.pangaea.metadataportal.harvester.SingleFileEntitiesHarvester
addDocument, addDocument, cancelMissingDocumentDelete, close
-
Methods inherited from class de.pangaea.metadataportal.harvester.Harvester
addDocument, createMetadataDocumentInstance, deleteDocument, finishReindex, getValidHarvesterPropertyNames, isAllIndexes, isClosed, isDocumentOutdated, main, prepareReindex, runHarvester, runHarvester, setHarvestingDateReference, setValidIdentifiers
-
-
-
-
Field Detail
-
DEFAULT_RETRY_TIME
public static final int DEFAULT_RETRY_TIME
- See Also:
- Constant Field Values
-
DEFAULT_RETRY_COUNT
public static final int DEFAULT_RETRY_COUNT
- See Also:
- Constant Field Values
-
DEFAULT_TIMEOUT
public static final int DEFAULT_TIMEOUT
- See Also:
- Constant Field Values
-
USER_AGENT
public static final String USER_AGENT
-
retryCount
protected final int retryCount
the retryCount from configuration
-
retryTime
protected final int retryTime
the retryTime from configuration
-
timeout
protected final int timeout
the timeout from configuration
-
-
Constructor Detail
-
ZipFileHarvester
public ZipFileHarvester(HarvesterConfig iconfig)
-
-
Method Detail
-
open
public void open(ElasticsearchConnection es, String targetIndex) throws Exception
Description copied from class:Harvester
Opens harvester for harvesting documents described by the givenHarvesterConfig
. OpensHarvester.processor
for usage inHarvester.harvest()
method.
-
harvest
public void harvest() throws Exception
Description copied from class:Harvester
This method is called by the harvester afterHarvester.open(de.pangaea.metadataportal.processor.ElasticsearchConnection, java.lang.String)
'ing it. Overwrite this method in your harvester class. This method should harvest files from somewhere, generateMetadataDocument
s and add them withHarvester.addDocument(de.pangaea.metadataportal.processor.MetadataDocument)
.
-
enumerateValidHarvesterPropertyNames
protected void enumerateValidHarvesterPropertyNames(Set<String> props)
Description copied from class:Harvester
This method is used by subclasses to enumerate all available harvester properties that are implemented by them. Overwrite this method in your own implementation and append all harvester names to the suppliedSet
. The public API for client code requesting property names isHarvester.getValidHarvesterPropertyNames()
.- Overrides:
enumerateValidHarvesterPropertyNames
in classSingleFileEntitiesHarvester
- See Also:
Harvester.getValidHarvesterPropertyNames()
-
-