Class PanFMP1IndexHarvester
- java.lang.Object
-
- de.pangaea.metadataportal.harvester.Harvester
-
- de.pangaea.metadataportal.harvester.SingleFileEntitiesHarvester
-
- de.pangaea.metadataportal.harvester.PanFMP1IndexHarvester
-
public class PanFMP1IndexHarvester extends SingleFileEntitiesHarvester
This harvester supports replication XML contents from a legacy panFMP 1.x installation. It is possible to replicate indexes with a different XML schema (by applying a transformation on the harvested XML content) or replicate only sub-sets of other indexes, based on a query string.Since panFMP was upgraded to use Elasticsearch 2.0, it is no longer possible to directly read old Lucene 3 indexes as used by panFMP 1.x. To use this harvester, you have to first download a latest Apache Lucene 4.10.x version and run the IndexUpgrader command line tool. After converting the index, you can harvest the index using this tool.
This harvester supports the following additional harvester properties:
indexDir
: file system directory with the old panFMP v1 indexquery
: query that matches all documents to harvest (default: all documents)analyzerClass
: class name ofAnalyzer
to use for the above query string (default: "org.apache.lucene.analysis.standard.StandardAnalyzer")queryParserClass
: class name ofQueryParser
to use for the above query string (default: "org.apache.lucene.queryparser.classic.QueryParser")defaultQueryParserOperator
: default operator when parsing above query string (AND/OR) (default: "AND")identifierPrefix
: This prefix is added in front of all identifiers from the foreign index (default: "")luceneMatchVersion
: TheVersion
constant passed to the analyzer and query parser of the foreign index (default isVersion.LUCENE_CURRENT
)
- Author:
- Uwe Schindler
-
-
Field Summary
Fields Modifier and Type Field Description static String
FIELDNAME_CONTENT
static String
FIELDNAME_DATESTAMP
static String
FIELDNAME_IDENTIFIER
static String
FIELDNAME_XML
-
Fields inherited from class de.pangaea.metadataportal.harvester.Harvester
fromDateReference, harvestCount, HARVESTER_METADATA_FIELD_LAST_HARVESTED, harvestMessageStep, iconfig, log, processor
-
-
Constructor Summary
Constructors Constructor Description PanFMP1IndexHarvester(HarvesterConfig iconfig)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close(boolean cleanShutdown)
Closes harvester.protected void
enumerateValidHarvesterPropertyNames(Set<String> props)
This method is used by subclasses to enumerate all available harvester properties that are implemented by them.void
harvest()
This method is called by the harvester afterHarvester.open(de.pangaea.metadataportal.processor.ElasticsearchConnection, java.lang.String)
'ing it.void
open(ElasticsearchConnection es, String targetIndex)
Opens harvester for harvesting documents described by the givenHarvesterConfig
.-
Methods inherited from class de.pangaea.metadataportal.harvester.SingleFileEntitiesHarvester
addDocument, addDocument, cancelMissingDocumentDelete
-
Methods inherited from class de.pangaea.metadataportal.harvester.Harvester
addDocument, createMetadataDocumentInstance, deleteDocument, finishReindex, getValidHarvesterPropertyNames, isAllIndexes, isClosed, isDocumentOutdated, main, prepareReindex, runHarvester, runHarvester, setHarvestingDateReference, setValidIdentifiers
-
-
-
-
Field Detail
-
FIELDNAME_CONTENT
public static final String FIELDNAME_CONTENT
- See Also:
- Constant Field Values
-
FIELDNAME_IDENTIFIER
public static final String FIELDNAME_IDENTIFIER
-
FIELDNAME_DATESTAMP
public static final String FIELDNAME_DATESTAMP
-
FIELDNAME_XML
public static final String FIELDNAME_XML
-
-
Constructor Detail
-
PanFMP1IndexHarvester
public PanFMP1IndexHarvester(HarvesterConfig iconfig) throws Exception
- Throws:
Exception
-
-
Method Detail
-
open
public void open(ElasticsearchConnection es, String targetIndex) throws Exception
Description copied from class:Harvester
Opens harvester for harvesting documents described by the givenHarvesterConfig
. OpensHarvester.processor
for usage inHarvester.harvest()
method.
-
close
public void close(boolean cleanShutdown) throws Exception
Description copied from class:Harvester
Closes harvester. All resources are freed and theHarvester.processor
is closed.- Overrides:
close
in classSingleFileEntitiesHarvester
- Parameters:
cleanShutdown
- enables writing of status information to the Elasticsearch instance for the next harvesting. If an error occurred during harvesting this should not be done.- Throws:
Exception
- if an exception occurs during closing (various types of exceptions can be thrown). Exceptions can be thrown asynchronous and may not affect the correct document.
-
harvest
public void harvest() throws Exception
Description copied from class:Harvester
This method is called by the harvester afterHarvester.open(de.pangaea.metadataportal.processor.ElasticsearchConnection, java.lang.String)
'ing it. Overwrite this method in your harvester class. This method should harvest files from somewhere, generateMetadataDocument
s and add them withHarvester.addDocument(de.pangaea.metadataportal.processor.MetadataDocument)
.
-
enumerateValidHarvesterPropertyNames
protected void enumerateValidHarvesterPropertyNames(Set<String> props)
Description copied from class:Harvester
This method is used by subclasses to enumerate all available harvester properties that are implemented by them. Overwrite this method in your own implementation and append all harvester names to the suppliedSet
. The public API for client code requesting property names isHarvester.getValidHarvesterPropertyNames()
.- Overrides:
enumerateValidHarvesterPropertyNames
in classSingleFileEntitiesHarvester
- See Also:
Harvester.getValidHarvesterPropertyNames()
-
-