Publication Components
The primary data store for cataloged metadata documents is a relational database management system. The relational database contains tables associated with: managing the metadata documents, referenced users (users owning data within the catalog), remote repositories registered for harvesting, and saved searches per user. The Geoportal extension will make use of the standard Java JDBC API when directly communicating with the relational database.
The primary components associated with the publication of documents to the Geoportal extension metadata catalog are depicted in the figure below.
Metadata documents that are classified as either "Approved" or "Reviewed" by an administrator will be sent to the Apache Lucene index used by the Geoportal extension. Documents stored within the index are discoverable through search.
Apache Lucene implements an Analyzer during the indexing (and searching) process. The job of the Analyzer is to tokenize terms, considering language based stop words and stemming. Additional Analyzers are available through the Apache Lucene contribution community.
The website has two pages exposing metadata publication end points:
- An upload page that provides a publisher the ability to upload metadata documents from a hard drive, or from an HTTP end point
- An online editor page that provides a publisher the ability to create and edit metadata documents. Only those documents that have been created by the online editor are available for subsequent edit.
The website exposes a Harvesting Client service endpoint allowing for metadata document publication from the Geoportal extension Desktop Harvester. The website also exposes a Publish Metadata service, allowing for compatible publication of metadata documents from client applications such as ArcCatalog. The
Geoportal extension Publish Client is a plug-in for ArcCatalog that batch publishes metadata documents (from folders or GeoDatabases) through this end point.
Each publication request implements a standardized methodology to process an XML metadata document:
- Interrogation: The document will be interrogated to determine its associated metadata standard
- Evaluation: The document will be evaluated according to the configuration file associated with the standard. Evaluation determines the primary parameters of interest (such as title, abstract, …)
- Validation: The document will be validated according to the configuration file associated with the standard. If the standard has an associated XSD (XML Schema Definition), the document will be validated against the XSD.
- Identification: A determination is made as to whether or not the document currently exists within the catalog. This step is necessary to avoid duplication and is dependent upon the content of the document (some have internal identifiers), and the publication method (some methods can provide a unique URI associated with the source).
- Store Document: The document is sent for storage within the relational database.
- Update Administrative Attributes: Administrative attributes within the relational database are updated through the Java JDBC API. Included are: the publication method, an internal file identifier if available, a URI associated with the source if available.
- Index if Required: If the document has previously been Approved or Reviewed by an administrator (or when it is Approved or Reviewed), the document is sent to the Apache Lucene Index. This step makes use of a Geoportal extension class (LuceneIndexAdapter) to communicate with the index through the Apache Lucene Java API.