ArcGIS Server Banner

Topology basics (ArcInfo and ArcEditor only)

Topology basics (ArcInfo and ArcEditor only)

Release 9.3 E-mail This TopicPrintable VersionGive Us feedback
A GIS topology is a set of rules and behaviors that model how points, lines, and polygons share coincident geometry. For example:

Adjacent counties share common boundaries. Counties nest within states.

These are examples of topological rules and behaviors that are commonly used to manage coincident geometry in a geodatabase.

Why topology?

Topology has long been a key GIS requirement for data management and integrity. In general, a topological data model manages spatial relationships by representing spatial objects (point, line, and area features) as an underlying graph of topological primitives—nodes, faces, and edges. These primitives, together with their relationships to one another and to the features whose boundaries they represent, are defined by representing the feature geometries in a planar graph of topological elements.

Example topological line graph of nodes, faces, and edges

Topology is fundamentally used to ensure data quality of the spatial relationships and to aid in data compilation. Topology is also used for analyzing spatial relationships in many situations such as dissolving the boundaries between adjacent polygons with the same attribute values or traversing along a network of the elements in a topology graph.

Topology can also be used to model how the geometry from a number of feature classes can be integrated. Some refer to this as vertical integration of feature classes.

Generally, topology is employed to do the following:

Ways that features share geometry in a topology

Features can share geometry within a topology. Here are some examples among adjacent features:

Examples of shared geometry that can be managed using a topology.

In addition, shared geometry can be managed between feature classes using a geodatabase topology. For example:

Two views: Features and topological elements

The following illustration shows how a layer of polygons can be described and used:

The Feature View and The Topology View.

This means that there are two alternatives for working with features—one in which features are defined by their coordinates and another in which features are represented as an ordered graph of their topological elements.

The evolution of geodatabase topology from ArcInfo coverages

NOTE: Reading this large topic is not necessary to implement geodatabase topologies. However, you may want to spend some time reading this if you are interested in the historical evolution and motivations for how topology is managed in the geodatabase.

The genesis of "Arc-node" and "Geo-relational"

ArcInfo coverage users have a long history and appreciation for the role that topology plays in maintaining the spatial integrity of their data.

Here are the elements of the ArcInfo coverage data model.

The feature classes in an ArcInfo coverage

In a coverage, the feature boundaries and points were stored in a few main files that were managed and owned by ArcInfo Workstation. The "ARC" file held the linear or polygon boundary geometry as topological edges, which were referred to as "arcs." The "LAB" file held point locations, which were used as label points for polygons or as individual point features such as for a wells feature layer. Other files were used to define and persist the topological relationships between each of the edges and the polygons.

For example, one file called the "PAL" file ("Polygon-arc list") listed the order and direction of the arcs in each polygon. In ArcInfo, software logic was used to assemble the coordinates for each polygon for display, analysis, and query operations. The ordered list of edges in the PAL file was used to look up and assemble the edge coordinates held in the ARC file. The polygons were assembled during run time when needed.

The coverage model had several advantages:

NOTE: An interesting historical fact: "Arc," when coupled with the table manager named "Info," was the genesis of the product name ArcInfo and hence all subsequent "Arc" products in the ESRI product family—ArcView, ArcIMS, ArcGIS, etc.

Coverages also had some disadvantages:

Shapefiles and simple geometry storage

In the early 1980s, coverages were seen as a major improvement over the older polygon and line-based systems in which polygons were held as complete loops. In these older systems, all of the coordinates for a feature were stored in each feature's geometry. Before the coverage and ArcInfo came along, these simple polygon and line structures were used. These data structures were simple, but had the disadvantage of double digitized boundaries. That is, two copies of the coordinates of the adjacent portions of polygons with shared edges would be contained in each polygon's geometry. The main disadvantage was that GIS software at the time could not maintain shared edge integrity. Plus, storage costs were enormous and each byte of storage came at a premium. During the early 1980s, a 300 MB disk drive was the size of a washing machine and cost $30,000! Holding two or more representations of coordinates was expensive, and the computations took too much compute time. Thus, the use of a coverage topology had real advantages.

During the mid-1990s, interest in simple geometric structures grew because disk storage and hardware costs in general were coming down while computational speed was growing. At the same time, existing GIS datasets were more readily available, and the work of GIS users was evolving from primarily data compilation activities to include data use, analysis, and sharing.

Users wanted faster performance for data use (for example, don't spend computer time to derive polygon geometries when we need them. Just deliver the feature coordinates of these 1,200 polygons as fast as possible). Having the full feature geometry readily available was more efficient. Thousands of geographic information systems were in use, and numerous datasets were readily available.

Around this time, ESRI had developed and published its ESRI shapefile format. Shapefiles used a very simple storage model for feature coordinates. Each shapefile represented a single feature class (of points, lines, or polygons) and used a simple storage model for the feature's coordinates. Shapefiles could be easily created from ArcInfo coverages as well as many other GIS systems. They were widely adopted as a de facto standard and are still massively used and deployed to this day.

A few years later, ArcSDE pioneered a similar simple storage model in relational database tables. A feature table could hold one feature per row with the geometry in one of its columns along with other feature attribute columns.

A sample feature table of state polygons is shown below. Each row represents a state. The SHAPE column holds the polygon geometry of each state.

Feature class table showing the shape column

This simple features model fits the SQL processing engine very well. Through the use of relational databases, we began to see GIS data scale to unprecedented sizes and numbers of users without degrading performance. We were beginning to leverage RDBMS for GIS data management.

Shapefiles became ubiquitous and, using ArcSDE, this simple features mechanism became the fundamental feature storage model in RDBMSs. (To support interoperability, ESRI was the lead author of the OGC and ISO simple features specification.)

Simple feature storage had clear advantages:

Its disadvantages were that maintaining the data integrity that was readily provided by topology was not as easy to implement for simple features. As a consequence, users applied one data model for editing and maintenance (such as coverages) and used another for deployment (such as shapefiles or ArcSDE layers).

Users began to use this hybrid approach for editing and data deployment. For example, users would edit their data in coverages, CAD files, or other formats. Then, they would convert their data into shapefiles for deployment and use. Thus, even though the simple features structure was an excellent direct use format, it did not support the topological editing and data management of shared geometry. Direct use databases would use the simple structures, but another topological form was used for editing. This had advantages for deployment. But the disadvantage was that data would become out of date and have to be refreshed. It worked, but there was a lag time for information update. Bottom line—topology was missing.

What GIS required and what the geodatabase topology model implements now is a mechanism that stores features using the simple feature geometry, but enables topologies to be used on this simple, open data structure. This means that users can have the best of both worlds—a transactional data model that enables topological query, shared geometry editing, rich data modeling, and data integrity, but also a simple, highly scalable data storage mechanism that is based upon open, simple feature geometry.

This direct use data model is fast, simple, and efficient. It can also be directly edited and maintained by any number of simultaneous users.

The topology framework in ArcGIS

In effect, topology has been considered as more than a data storage problem. The complete solution includes

In a geodatabase topology, the validation process identifies shared coordinates between features (both in the same feature class and across feature classes). A clustering algorithm is used to ensure that the shared coordinates have the same location. These shared coordinates are stored as part of each feature's simple geometry.

This enables very fast and scalable lookup of topological elements (nodes, edges, and faces). This has the added advantage of working quite well and scaling with the RDBMS's SQL engine and transaction management framework.

During editing and update, as features are added, they are directly usable. The updated areas on the map, called "dirty areas," are flagged and tracked as updates are made to each feature class. At any time, users can choose to topologically analyze and validate the dirty areas to generate clean topology. Only the topology for the dirty areas needs rebuilding, saving processing time.

The results are that topological primitives (nodes, edges, faces) and their relationships to one another and their features can be efficiently discovered and assembled. This has several advantages:

In cases where users want to store the topological primitives, it is easy to create and post topologies and their relationships to tables for various analytic and interoperability purposes (such as users who want to post their features into an Oracle Spatial warehouse which stores tables of topological primitives).

At a pragmatic level, the ArcGIS topology implementation works. It scales to extremely large geodatabases and multiuser systems without loss of performance. It includes rich validation and editing tools for building and maintaining topologies in geodatabases. It includes rich and flexible data modeling tools that enable users to assemble practical, working systems on file systems, in any relational database, and on any number of schemas.