How Lucene Indexing Works in the Geoportal, Under the Hood

Release 9.3.1 E-mail This Topic Printable Version Give Us Feedback

How Lucene Indexing Works in the Geoportal, Under the Hood

Indexing is important because it determines what search results are returned when a user submits search criteria to the geoportal. When publishing a metadata document, certain content from the document will be submitted for indexing by the search engine. To facilitate the more advanced features of Lucene, this information is assigned a particular meaning. This 'meaning' determines how Lucene will index the content and how it may be used in searching.



Before a 'meaning' value can be used, it has to be defined in a file called "property-meanings.xml", located in the \\geoportal\WEB-INF\classes\gpt\metadata folder. Lucene references "property-meanings.xml" to index the metadata value for search and retrieval. Before adding new meanings, we strongly suggest using the existing meanings. This will minimize effort migrating to future versions of the Geoportal extension. The existing meanings should satisfy most of the search needs.



Assign a 'meaning' attribute to relevant parameters by altering the <parameter ... 'meaning'> attribute in the definition.xml files in your \\geoportal\WEB-INF\classes\gpt\metadata folder. An example parameter for indexing keywords is shown below:



<parameter key="keywordinfo" meaning="title">



When a user searches by 'title', Lucene is searching all its 'title' indexes for the search term. That 'title' index is defined by the definition.xml file having a metadata parameter defined with the 'meaning="title"' attribute. Note that after you modify a parameter in your definition.xml file, records that are already published in your Geoportal extension will have to be re-Approved to index the new meaning term.





Open the property-meaning.xml file. Notice that there are several "property-meaning" parameters. Most have a name, meaningType, valueType, and comparisonType. These attributes for property-meanings are described below.



Attribute Name Description
name Unique name for the meaning in this file, and should match the meaning="" attribute in the definition.xml file. The name designated becomes a Lucene field that can be used for advanced searches, as per Lucene documentation. For example, designating a name of 'title' and then typing 'title: water" on your GPT search page will only return items with "water" in the index Lucene has associated with the property-meaning 'title'.
meaningType Used to flag metadata elements that are tied to functionality within the Portal. It is good practice to avoid altering the meaningType of a property-meaning
valueType Data type of the attribute. Examples are String, Timestamp, and geometry
comparisonType Indicates how Lucene will analyze the terms in the element for that meaning. There are three options defined in the property-meaning.xml file:
  • term: phrases associated with this attribute are tokenized. For example, if "San Diego" is the word that is being stored, if it is associated with a meaning that has a comparisonType of 'term', it will be stored as two separate words "San" and "Diego". Terms are also stored in a lowercase form, e.g. "san" and "diego".
  • keyword: phrases associated with this attribute are not tokenized. For example, if "San Diego" is the word that is being stored, if it is associated with a meaning that has a comparisonType of 'keyword', it will be stored as one phrase. A search for "San" will not return the record; only a search for "San Diego". Terms are also stored in a lowercase form, e.g., "san diego".
  • value: items associated with this attribute are stored as values, not phrases or words. Items are case-sensitive. An example would be the fileIdentifier meaning. Parameters with a meaning="fileIdentifier" likely hold unique identification strings, such as "{F56408D6-4325-484C-B753-5E8FD4421E31}". Searching for part of the string, such as "E31" will not retrieve the record because the string is stored as a complete value and not parsed. Searching for the string "{f56408d6-4325-b753-5e8fd4421e31}" will also not return the record because the value stored is case-sensitive.




Some property-meanings have one or two additional tags, <dc> and <consider>.







Note: The anytext meaning is a special case. If you set the meaning of a parameter to anytext, that parameter will be indexed as a body meaning. Search results for anytext:searchTerm (where searchTerm is the word or phrase for which you are searching) will not retrieve results. However, body:searchTerm will retrieve results with search phrases in the elements where the meaning is set to anytext. The anytext index itself is reserved for CS-W alone.

Working Example

In this example we will alter the form for ISO 19115/19139 Datasets to make the element called "data type" searchable from the basic search field.







  1. Open the iso19139-coregeog-definition.xml file from the \\geoportal\WEB-INF\classes\gpt\metadata folder in a text editor.
  2. Find where Data Type is defined:







  3. Add a "meaning" attribute to this parameter. Because we are not mapping data type to a property-meaning that should only have one value per document (such as abstract, title, or fileidentifier), let's set the meaning to "body".







  4. Save the iso19139-coregeog-definition.xml file. Restart Tomcat for changes to take affect. Conceptually, now any ISO 19139 dataset document published to the Geoportal will have its data type value searchable from the search page.
  5. Now, find in the iso19139-coregeog-definition.xml file where the abstract parameter is defined. Notice that it has a meaning attribute set to meaning="abstract". This means that if a user types "abstract:whateverPhrase" in the search field on the search page, the Geoportal will search all elements with a meaning of 'abstract' for the phrase 'whateverPhrase' and return matching records as search results.
Note: The geoportal can be customized so that it automatically indexes all metadata content, regardless of which parameter it is associated with in the metadata. To enable this customization, see Index All Metadata Content Overview.