Print Topics
Topics for:
Designing a geodatabase

An overview of geodatabase design

Release 9.3
Last modified April 24, 2009
E-mail This Topic Printable Version Give Us Feedback

Print all topics in : "[PRINTBOOKS_NAME]"


Note: This topic was updated for 9.3.1.

An overview of geodatabase design


Geodatabase design is based on a common set of fundamental GIS design steps, so it¨s important to have a basic understanding of these GIS design goals and methods. This section provides an overview.

GIS design involves organizing geographic information into a series of data themes—layers that can be integrated using geographic location. So it makes sense that geodatabase design begins by identifying the data themes to be used, then specifying the contents and representations of each thematic layer.

This involves defining



Representation


Each GIS database design begins with deciding what the geographic representations will be for each dataset. Individual geographic entities can be represented as



Data themes


Geographic representations are organized in a series of data themes (sometimes referred to as thematic layers). A key concept in a GIS is one of data layers or themes. A data theme is a collection of common geographic elements such as a road network, a collection of parcel boundaries, soil types, an elevation surface, satellite imagery for a certain date, well locations, and so on.

The concept of a thematic layer was one of the early notions in GIS. Practitioners thought about how the geographic information in maps could be partitioned into logical information layers—as more than a random collection of individual objects (such as a road, a bridge, a hill, a house, a peninsula). These early GIS users organized information in thematic layers that described the distribution of a phenomenon and how it should be portrayed across a geographic extent. These layers also provided a protocol (capture rules) for collecting the representations (as feature sets, raster layers, attribute tables, and so on).

In GIS, thematic layers are one of the main organizing principles for GIS database design.

Example of common data themes for a land records database

Each GIS will contain multiple themes for a common geographic area. The collection of themes acts as layers in a stack. Each theme can be managed as an information set independent of other themes. Each has its own representations (points, lines, polygons, surfaces, rasters, and so on). Because the various independent themes are spatially referenced, they overlay one another and can be combined in a common map display. Plus, GIS analysis operations, such as overlay, can fuse information between themes.

GIS datasets are collections of representations for a data theme


Geographic data collections can be represented as feature classes and raster-based datasets in a GIS database.

Many themes are represented by a single collection of homogeneous features such as a feature class of soil type polygons and a point feature class of well locations. Other themes, such as a transportation framework, are represented by multiple datasets (such as a set of spatially related feature classes for streets, intersections, bridges, highway ramps, and so on).

Raster datasets are used to represent continuous surfaces, such as elevation, slope, and aspect, as well as to hold satellite imagery, aerial photography, and other gridded datasets (such as land cover and vegetation types).

Both the intended use and existing data sources influence spatial representations in a GIS. When designing a GIS database, users have a set of applications in mind. They understand what questions will be asked of the GIS. Defining these uses helps determine the content specification for each theme and how each is to be represented geographically. For example, there are numerous alternatives for representing surface elevation: as contour lines and spot height locations (such as hilltops, peaks), as a continuous terrain surface (a TIN), or as shaded relief. Any or all of these may be relevant for each particular GIS database design. The intended uses of the data will help to determine which of these representations will be required.

Frequently, the geographic representations will be predetermined to some degree by the available data sources for the theme. If a preexisting data source was collected at a particular scale and representation, it will often be necessary to adapt your design to use it.

Individual GIS datasets often are collected in concert with other data layers


While each GIS dataset can be used independently of other GIS data, it is often quite important to collect datasets in concert with other information layers so that the fundamental spatial behavior and spatial relationships are maintained and consistent between the related GIS data layers. Here are a few examples that help illustrate this concept:



In each of these cases, there is a data model that defines a collection of related data themes that fit into an overall information framework. Each framework is essentially a collection of related data themes that are best captured in unison with each other. The data capture guidelines follow sound scientific principles about their spatial behavior and relationships. Each theme plays an important part in the holistic characterization of a particular landscape. For example:



This concept of collecting integrated data themes in unison is one of the key design principles used in each of the ArcGIS Data Models.


Geodatabase design steps

Release 9.3
Last modified April 24, 2009
E-mail This Topic Printable Version Give Us Feedback

Print all topics in : "[PRINTBOOKS_NAME]"


Note: This topic was updated for 9.3.1.

Design starts with thematic layers. First, you identify the thematic layers you¨ll need for your particular application and information requirements. What are the data themes that make up your key landscapes? Then, you define each thematic layer in more detail. The characterization of each thematic layer will result in a specification of standard geodatabase data elements such as feature classes, tables, relationship classes, raster datasets, subtypes, topologies, domains, and so on.

When identifying thematic layers in your design, try to characterize each theme in terms of its visual representations, its expected uses in the GIS, its likely data sources, and its levels of resolution. For example, at what scales and extents will you need to use this information, and how will its elements be represented at each scale? These characteristics help describe the high-level contents expected from each theme.

Here is an example description of a data theme for ownership parcels in a cadastral application:

Description of a GIS data theme for ownership parcels for U.S.-based systems

Once you have identified the key thematic layers in your design, the next step is to develop specifications for representing the contents of each thematic layer in the physical database.



The 11 steps presented below outline a general GIS database design process. The initial design steps 1 through 3 help you identify and characterize each thematic layer. In steps 4 through 7, you begin to develop representation specifications, relationships, and ultimately, geodatabase elements and their properties. In steps 8 and 9, you will define the data capture procedures and assign data collection responsibilities. In the final stage (steps 10 and 11), you will test and refine your design through a series of initial implementations. In this final phase, you will also document your design.

Eleven steps to geodatabase design
1. Identify the information products that you will create and manage with your GIS.
Your GIS database design should reflect the work of your organization. Consider compiling and maintaining an inventory of map products, analytic models, Web mapping applications, data flows, database reports, key responsibilities, 3D views, and other mission-based requirements for your organization. List the data sources you currently use in this work. Use these to drive your data design needs.
Define the essential 2D and 3D digital basemaps for your applications. Identify the set of map scales that will appear in each basemap as you pan, zoom, and explore its contents.
2. Identify the key data themes based on your information requirements.
Define more completely some of the key aspects of each data theme. Determine how each dataset will be used—for editing, GIS modeling and analysis, representing your business workflows, and mapping and 3D display. Specify the map use, the data sources, and the spatial representations for each specified map scale; data accuracy and collection guidelines for each map view and 3D view; how the theme is displayed, its symbology, text labels, and annotation. Consider how each map layer will be displayed in an integrated fashion with other key layers. For modeling and analysis, consider how information will be used with other datasets (for example, how they are combined and integrated). This will help you to identify some key spatial relationships and data integrity rules. Ensure that these 2D and 3D map display and analysis properties are considered as part of your database design.
3. Specify the scale ranges and the spatial representations of each data theme at each scale.
Data is compiled for use at a specific range of map scales. Associate your geographic representation for each map scale. Geographic representation will often change between map scales (for example, from polygon to line or point). In many cases, you may need to generalize the feature representations for use at smaller scales. Rasters can be resampled using image pyramids. In other situations, you may need to collect alternative representations for different map scales.
4. Decompose each representation into one or more geographic datasets.
Discrete features are modeled as feature classes of points, lines, and polygons. You can consider advanced data types such as topologies, networks, and terrains to model the relationships between elements in a layer as well as across datasets.
For raster datasets, mosaics and catalog collections are options for managing very large collections.
Surfaces can be modeled using features, such as contours, as well as using rasters and terrains.
5. Define the tabular database structure and behavior for descriptive attributes.
Identify attribute fields and column types.
Tables also might include attribute domains, relationships, and subtypes. Define any valid values, attribute ranges, and classifications (for use as domains). Use subtypes to control behaviors. Identify tabular relationships and associations for relationship classes.
6. Define the spatial behavior, spatial relationships, and integrity rules for your datasets.
For features, you can add spatial behavior and capabilities and also characterize the spatial relationships inherent in your related features for a number of purposes using topologies, address locators, networks, terrains, and so on. For example, use topologies to model the spatial relationships of shared geometry and to enforce integrity rules. Use address locators to support geocoding. Use networks for tracing and path finding. For rasters, you can decide if you need a raster dataset or a raster catalog.
7. Propose a geodatabase design.
Define the set of geodatabase elements you want in your design for each data theme. Study existing designs for ideas and approaches that work.
Copy patterns and best practices from the ArcGIS Data Models.
8. Design editing workflows and map display properties.
Define the editing procedures and integrity rules (for example, all streets are split where they intersect other streets and street segments connect at endpoints).
Design editing workflows that help you to meet these integrity rules for your data.
Define display properties for maps and 3D views.
Determine the map display properties for each map scale. These will be used to define map layers.

9. Assign responsibilities for building and maintaining each data layer.
Determine who will be assigned the data maintenance work within your organization or assigned to other organizations. Understanding these roles is important. You will need to design how data conversion and transformation is used to import and export data across various partner organizations.
10. Build a working prototype. Review and refine your design.
Test your prototype design. Build a sample geodatabase copy of your proposed design using a file, personal, or ArcSDE Personal geodatabase. Build maps, run key applications, and perform editing operations to test the design¨s utility. Based on your prototype test results, revise and refine your design. Once you have a working schema, load a larger set of data (such as loading it into an ArcSDE geodatabase) to check out production, performance, scalability, and data management workflows.
This is an important step. Settle on your design before you begin to populate your geodatabase.
11. Document your geodatabase design.
Various methods can be used to describe your database design and decisions. Use drawings, map layer examples, schema diagrams, simple reports, and metadata documents.
Some users like using UML. However, UML is not sufficient on its own. UML cannot represent all the geographic properties and decisions to be made. Also, UML does not convey the key GIS design concepts such as thematic organization, topology rules, and network connectivity. UML provides no spatial insight into your design.
Many users like using Visio to create a graphic representation of their geodatabase schema such as those published with the ArcGIS data models. ESRI provides a tool that can help you capture these kinds of graphics of your data model elements using Visio. Refer to the topic Documenting your geodatabase design.


Using ArcGIS Data Model designs

Release 9.3
Last modified April 24, 2009
E-mail This Topic Printable Version Give Us Feedback

Print all topics in : "[PRINTBOOKS_NAME]"


Note: This topic was updated for 9.3.1.

ESRI, along with its user community, has invested a significant amount of time to develop a series of geodatabase data model templates that provide a jump start for your geodatabase designs. These designs are described and documented at http://support.esri.com/datamodels.

At this Web site, you can find existing geodatabase templates as well as useful documentation on geodatabase design for many industries and applications. These models typically are a good starting point. Most users start with these design templates, then refine and extend them to meet their specific needs and requirements.

Once you find a relevant data model, you can download a geodatabase template from the site that you can use to jump-start your design. You can build a test geodatabase, load some data into it, and then test and refine the design for use within your GIS.

Steps in using an ArcGIS data model as the basis for your design



The steps involved in using an ArcGIS data model are very similar to how you might import and modify any existing geodatabase design.




Documenting your geodatabase design

Release 9.3
Last modified April 24, 2009
E-mail This Topic Printable Version Give Us Feedback

Print all topics in : "[PRINTBOOKS_NAME]"


Note: This topic was updated for 9.3.1.

Documenting your geodatabase design is important. At the ArcGIS data models Web site (http://support.esri.com/datamodels), a series of diagrams is used to represent the key design concepts and to document the specifications of geodatabase elements, metadata, and map layers in each of the data model templates. This section provides a short overview for how various geodatabase elements are presented at the Web site and may be helpful as you document your own designs.

There are six key elements to represent the contents of your geodatabase design. These include

  1. Datasets—These are specifications for how to record the properties of feature classes, rasters, and attribute tables as well as the set of columns in each table. For spatial representations, you¨ll see some geometric properties (such as point, line, and polygon and types of coordinates). Often, you¨ll see a specification for subtypes. These parts of the schema diagram are always shown in blue.

  2. Dataset

  3. Relationship Classes—Attribute relationships are widely used in GIS, just as they are in all DBMS applications. They define how rows in one table can be associated with rows in another table. Relationships have a direction of cardinality and other properties (for example, is this a one-to-one, one-to-many, or many-to-many relationship?). Relationships and their properties are shown in green.

  4. Relationships

  5. Domains—These represent the list or range of valid values for attribute columns. These rules control how the software maintains data integrity in certain attribute columns. Domains are shown in red.

  6. Document domains by listing the valid values and their meanings for a field or by listing the valid value range.

  7. Spatial Relationships and Spatial Rules—A number of advanced data modeling capabilities are available for geodatabases. For example, data elements, such as topologies and their properties, are used to model how features share geometry with other features. Topologies, along with network datasets, address locators, terrains, cartographic representations, geometric networks, and many other advanced geodatabase types, provide a very critical and widely used GIS mechanism to enable spatial behaviors and to enforce integrity in GIS databases. These and other rules, such as networks, are shown in orange.

  8. The best way to think about how to document and describe the set of extended data types in the geodatabase is to describe their rules and the behaviors of the spatial relationships. The following is an example of how a topology can be documented:

    Rules can be described for each data element such as a topology, address locator, network dataset, and so forth.

  9. Map Layers—GIS includes interactive maps and other views. A critical part of each dataset is the specification for how it is symbolized and rendered in maps. These are typically defined as layer properties in ArcMap, which specify how features are assigned map symbology (colors, fill patterns, line and point symbols) and text labels. Layers are not managed in geodatabases but are an important aspect in helping to define some key dataset properties in a geodatabase schema. Layer specifications are shown in yellow. Layers can be stored as .lyr files or as elements in an ArcMap document (.mxd). See Adding layers to a map for more information on map layers.

  10. Map layers are not part of a geodatabase design but define important display properties for using datasets stored in the geodatabase.

  11. 2D and 3D Basemaps—Define the fundamental basemap displays and determine if this data theme will be used in these interactive map displays. If this is the case, it is important to define the set of map scales for your basemaps and the map display properties at each map scale. You¨ll essentially define a different map specification for each map scale and define map layers for each scale.

  12. Display of a basemap at mutliple levels of resolution




Using Microsoft Visio and the Geodatabase Diagrammer tool


ESRI provides a diagramming utility as a download for users who want to generate graphics similar to these for their geodatabase designs. You can download a tool, Geodatabase Diagrammer, that will generate a series of Visio graphics of your datasets and elements in your geodatabase. Search for "Geodatabase Diagrammer" at http://arcscripts.esri.com.

This tool is used to create graphical elements in Visio of your geodatabase contents. You can easily cut and paste graphics from Visio into Microsoft Word, PowerPoint, and any application that accepts .wmf files.

Geodatabase diagrammer

Geodatabase diagrammer

Geodatabase diagrammer

Documenting additional properties of your geodatabase design


Other key properties of your geodatabase design should be considered and documented including




Modeling feature classes

Release 9.3
Last modified April 24, 2009
E-mail This Topic Printable Version Give Us Feedback

Print all topics in : "[PRINTBOOKS_NAME]"


Note: This topic was updated for 9.3.1.

The following are some useful design tips for modeling geodatabase feature classes:

Task 1. Design simple feature classes.


Almost without exception, every geodatabase will contain feature classes. You may want only a simple geodatabase design that contains just a collection of feature classes. However, most users will find the need to develop a more comprehensive data model that adds advanced geodatabase elements. You will make the decision to extend your simple feature class designs based on your system needs and goals; you¨ll extend your design to support essential GIS functionality and behavior. This section introduces many of these feature class capabilities and points you to help topics where you can get more information on each option.

Start by defining the common properties of simple feature classes. You can add to this later as needed, but focus on defining your basic design first.

A feature class is a collection of geographic features with the same geometry type (such as point, line, or polygon), a common set of attribute columns, and the same coordinate system.

Example feature classes in ArcGIS
Feature Ccass Representation Notes
Street centerlines Line Street segments split at each intersection; usually contain address ranges and network properties
Wells Point
Soil types Polygon Usually have many descriptive attributes in related tables
Parcels* Polygon Topologically integrated with parcel boundaries and corners
Parcel boundaries* Line Has coordinate geometry and dimension attributes; participates in a topology with parcels and corners
Parcel corners* Point Surveyed corners of parcels; participates in a topology with parcel polygons and boundaries
Parcel annotation Annotation Provides text labels for lot dimensions, taxation, and legal description information
Building footprints Polygon Contains outlines of buildings and structures

* The Cadastral Fabric dataset provides parcel behavior and specialized parcel-based topology for these feature classes.

Once you settle on a proposed list of feature classes, try to define the following for each:



Sometimes, you¨ll load feature data as is into your GIS. If this is the case, you may not need to do any of the following additional design tasks. However, it is important to evaluate the advantages of adding further GIS capabilities to the features in your geodatabase. These additional capabilities can potentially make data use and maintenance much easier in the long term. They will help you maintain the integrity of your spatial information; will help in many ways for your data use; and, most important, will help you understand how much confidence you can place in your data to meet your needs.

Some common reasons for extending your simple features data model are as follows:



Task 2. Organize related feature classes into feature datasets.


Use feature datasets to organize spatially related feature classes into a common feature dataset. Feature datasets are necessary if you want to


A feature dataset is a collection of spatially or thematically related feature classes that share a common coordinate system. Feature datasets are used to hold feature classes that participate in a shared topology, a network dataset, a geometric network, or a terrain.

Sometimes users will organize a collection of feature classes for a common theme into a single feature dataset. For example, users might have a feature dataset for Water that contains Hydro Points (such as dams, bridges, and intakes), Hydro Lines (streams, canals, rivers), and Hydro Polygons (lakes, catchment areas, watersheds, etc.).

In some situations, people might use feature datasets as folders to hold a collection of simple feature classes. This technique is primarily used to organize how users share datasets. However, it is not a useful data structure for editing.

You will need to go through tasks 3 and 4 to decide on a final design for what feature classes should be organized within each feature dataset.

Feature datasets play a key role in establishing permissions for data editing. All the feature classes in a feature dataset will have the same permissions. This means that users can set permissions on feature datasets to identify which organization or group will maintain its contents. If different permissions need to be set on each feature class, then the feature classes should be organized in separate feature datasets (or feature classes), each with its own permission settings. In these cases, extract, transform, load (ETL) or Import/Export procedures can be used to move data updates between each dataset.

When to use feature datasets
Use feature datasets to spatially or thematically integrate related feature classes. Their primary purpose is for building a topology, a network dataset, a terrain dataset, or a geometric network.

You must use feature datasets to hold the set of feature classes that participate in any of the following geodatabase capabilities:


Task 3. Add geodatabase elements to facilitate data editing and to manage data integrity.


The geodatabase includes some optional data modeling capabilities that add integrity rules and editing behavior to your GIS. These capabilities help you automate much of your data management work and integrity checks.



Task 4. Add capabilities for advanced data uses, analytic models (such as network analysis and geocoding), and advanced cartography.



With each dataset, you may want to consider adding additional geodatabase capabilities that help you further leverage each dataset. A number of alternative options are available, and you can apply any of these to add capabilities to your geodatabase.




A note about the use of UML for geodatabase design

Release 9.3
Last modified April 24, 2009
E-mail This Topic Printable Version Give Us Feedback

Print all topics in : "[PRINTBOOKS_NAME]"


Note: This topic was updated for 9.3.1.


ArcGIS supports the use of CASE tools to import Unified Modeling Language (UML) models for geodatabase designs. However, support for all geographic data types, relationships, and behaviors is not complete in UML data modeling.

While UML is a useful tool for documenting the relational aspects of a geodatabase schema (such as table layouts and relationships), generally, it is not recommended to solely use UML for geodatabase design.

UML can be useful for relational database design (for example, for schemas that primarily contain feature classes, attribute tables, and a few other geodatabase properties). However, UML has generally not been useful for designing richer geographic behavior—topologies, networks, terrains, raster catalogs, map layers, map symbols, metadata, cartographic representations, semantic classifications, address locators, cadastral fabrics, linear referencing, and geoprocessing models. These data elements are used to define geographic behavior and associations.

Much of the richness of the geodatabase cannot be universally expressed in a UML design. More important, no special GIS insight is achieved through UML design. Graphing a hierarchy of object-oriented classes, subclasses, and inheritance in UML does not provide insight on how to model the spatial relationships in your geographic data; for example:



Often, UML distracts designers from defining use cases that help you more clearly articulate critical geographic behaviors and spatial relationships.

Certainly, user communities can find some ways to express their geographic data elements as UML. In other words, you can document many (but not all) design aspects of your geodatabase using UML.

Additionally, many relational modelers depend heavily on UML and want their GIS designs to interoperate with their other DBMS designs. In these cases, you can share parts of your geodatabase schemas using UML.

In addition, many people primarily want to use UML as a means to share their schema and rules. ArcGIS has other mechanisms that can support schema documentation and sharing, such as via geodatabase XML.

Bottom line: UML is one of a number of methodologies (such as entity-relationship modeling) that can be used effectively for relational and tabular modeling. However, the use of UML alone is not sufficient. UML is not a replacement for the necessary work of geographic data modeling required in GIS—defining spatial behaviors and use cases of the spatial relationships you want your geodatabase to convey. The design steps described earlier in this Design section of the help (See Geodatabase design steps) will provide guidance on these other aspects of geodatabase design.

A useful tool for documenting your schema using graphic representations that ESRI uses is described in Using the geodatabase diagrammer tool.


Design Tips

Release 9.3
Last modified April 24, 2009
E-mail This Topic Printable Version Give Us Feedback

Print all topics in : "[PRINTBOOKS_NAME]"


Note: This topic was updated for 9.3.1.

Geodatabase data models are designed to be used in practical application scenarios by a wide range of users. To ensure that each design is easy to understand and implement, each data model was built to support easy migration from existing data structures and has been designed to be flexible, extensible, and easily adapted by your organization. Here are a few final design tips to help you with your design implementations.



During the final stage of design, you’ll want to test scalability and workflows that represent the work that your organization will perform with your geodatabase. Use this to make final adjustments to your design. Be practical in your final test phase and adjust your design as necessary.