Index All Metadata Content

Release 9.3.1 E-mail This Topic Printable Version Give Us Feedback

Index All Metadata Content Overview

Indexing is important because it determines what search results are returned when a user submits search criteria to the geoportal. To understand this customization, you should be familiar with the webhelp document How Lucene Indexing Works in the Geoportal.



By default, the Geoportal extension is set to index information in a metadata document that meets two criteria:
  1. the information must be located at a parameter defined in the standard's definition.xml file. For example, for FGDC the fgdc-best-practice-definition.xml file defines a parameter for Abstract located at the /metadata/idinfo/descript/abstract xpath. It does not provide a parameter for Supplemental Information located at the /metadata/idinfo/descript/supplinf, although it could be customized to contain this parameter. Therefore, if metadata contains information at the /metadata/idinfo/descript/supplinf element, it will not be indexed.
  2. the parameter in the standard's definition.xml file must have a meaning attribute assigned to it.
There are advantages to only indexing certain pieces of information. One is that the Lucene index will not be as large if only certain information is indexed. This facilitates faster searching. Also, some information included in metadata is not useful for text-based searching. For example, if the metadata record contains a thumbnail, there is no need to index the thumbnail binary section in the metadata because users are not going to search for characters within the binary. Specifying only specific information to be indexed provides control over the search results. A user searching for "New York" may want to retrieve results with "New York" in the title and abstract, not the Point of Contact's address information.



However, there are advantages to indexing all metadata content as well. One example is that some records contain elements not defined in that schema's definition.xml file. The example above with the FGDC supplemental information is one case. If there is important information in that parameter, it would be ideal to retrieve that record in a search result.



If your organization decides to automatically index all metadata content, the customization to do so is simple. Before proceeding, you should be familiar with the way the geoportal exposes metadata profiles, and the files associated with each profile. See Add a Custom Profile for more information.

Configuration

To index all content, you will need to add an additional parameter to the supported metadata schemas' definition.xml files. This parameter should be placed in a section that is not optional; alternatively, you could create a new section that is not visible, and include only the new parameter in the section.



The parameter, within a custom invisible section, would be defined similar to the example below.



<section key="index" visible="false">

<!-- Miscellaneous elements to index -->

<parameter key="misc.all" meaning="body" visible="false">

<content nodeType="list" select="/metadata/*"/>

</parameter>

</section>




Procedure
  1. Open the fgdc-bestpractice-definition.xml file from the \\geoportal\WEB-INF\classes\gpt\metadata directory.
  2. Scroll to the bottom of the file, just above the final closing </schema> tag.
  3. Paste the code snippet above as a new section just before the final closing schema tag.
  4. Update the select attribute in the <content> element for this new section according to the chart below. This select attribute is telling the geoportal to select all the elements at this xpath and apply the indexing rule defined in the meaning attribute. So the xpath at the select attribute should be the highest node in each supported standard's schema. The table below shows which values to use for the select attribute for the out-of-the-box schemas.
    Schema definition file select value
    FGDC Best Practice fgdc-bestpractice-definition.xml select="/metadata/*"
    Dublin Core dc-definition.xml Note: Dublin Core already indexes all elements by default. You do not have to do this customization for the Dublin Core schema. If you have a custom schema based on Dublin Core, then use select="/rdf:RDF/rdf:Description/*"
    ESRI ISO esri-iso-definition.xml select="/metadata/*"
    INSPIRE Metadata (Datasets) INSPIRE-iso19115-definition.xml select="/gmd:MD_Metadata/*"
    INSPIRE Metadata (Services) INSPIRE-iso19119-definition.xml select="/gmd:MD_Metadata/*"
    North American Profile (Datasets and Dataset Series) iso19139-NAP-data-minimum-definition.xml select="/gmd:MD_Metadata/*"
    North American Profile (Services) iso19139-NAP-service-minimum-definition.xml select="/gmd:MD_Metadata/*"
    ISO 19139/19119 Web Services INSPIRE-iso19119-definition.xml select="/gmd:MD_Metadata/*"
    ISO 19139/19115 Datasets iso19139-coregeog-definition.xml select="/gmd:MD_Metadata/*"
  5. Save the file.
  6. Repeat for each supported standard in your geoportal.
  7. Stop the geoportal web application.
  8. Navigate to the folder defined for the lucene index. This is the filepath located at the <lucene> element's indexLocation attribute in the gpt.xml file (from the \\geoportal\WEB-INF\classes\gpt\config folder).
  9. You need to clear the old index or create a new one. You can do this by either deleting all the old files from the lucene indexing folder, or you can create a new folder and update the filepath in the <lucene> element in gpt.xml.
  10. Save gpt.xml (if changes were made).
  11. Start the geoportal web application. If you created a new folder for the indexLocation, then the documents will be reindexed automatically since the lucene folder location changed. This may take some time, so do not be alarmed if immediately you do not see search results. If instead you cleared out the old index by deleting the index files, then the documents may need to be re-approved to be reindexed.