Show Navigation | Hide Navigation
You are here:
Image and raster data management > Building and managing a raster database

Building large raster datasets-case studies

Release 9.3
Last modified November 12, 2009
E-mail This Topic Printable Version Give Us Feedback

Print all topics in : "Building and managing a raster database"


Related Topics

Note: This topic was updated for 9.3.1.

The following topic examines how to build large raster datasets by discussing how to create both a raster dataset and a raster catalog from TIFF files.


Creating a raster dataset from TIFF files

In this example, there are several hundred TIFF image files stored on a CD that must be mosaicked into a raster dataset. The basic workflow to create and populate the raster dataset is as follows:

  1. Configure the system.
  2. Allocate disk space.
  3. Stage the source data.
  4. Create the raster dataset.
  5. Mosaic the source data into the raster dataset.
  6. Build DBMS statistics.
  7. Build raster statistics.

1. Configure the geodatabase server system.


The basic components of a server system include the network, operating system of the server and the client machines, DBMS, ArcSDE server, and choice of ArcSDE connection architecture. The following are suggestions for configuring each component. For more detailed information, see Loading large raster datasets into ArcSDE.

Configuring the network
The network should be able to handle a large amount of raster data written from the client machines to the server without impacting other users on the system. In some cases, the network administrators may have to upgrade the network or run the raster-loading programs in the evenings and over the weekends during off-peak hours.

Configuring the operating system
Configure the maximum allowable operating system shared memory segment and allocate the required amount of memory to the DBMS data and log buffers.

Configuring the DBMS
The basic goal is to increase the data throughput into the database as much as possible; therefore, whenever possible, reconfigure the DBMS to efficiently load large amounts of raster data, optimizing it for write access.

Configuring ArcSDE
Two separate steps should be considered when configuring ArcSDE:


2. Allocate DBMS storage space for the raster dataset.


Beyond allocating disk space to the basic database transaction log files, temporary sort space, and data dictionary storage that occurs whenever the DBA creates a DBMS database, the DBA must also allocate space for the storage of the raster dataset. Enough disk space must be provided for the database table spaces, and enough disk space must be set aside to stage the image files on the file system.

You need to consider both the storage arrangement of the tables and indexes of the rasters and the physical size of the operating system files assigned to the database when constructing the DBMS storage space for the raster datasets.

It is highly recommended that you preallocate DBMS storage space for the raster dataset because it eliminates the overhead of having the database allocate the space during the loading process. The operating system, database, or hardware problems can be identified and fixed prior to loading the raster data, and for some DBMSs, the storage of data is balanced across the file system if all the space is available initially.

Allocating space for TIFF image files
It is recommended that you copy the TIFF image files from the CDs to a disk drive on a local network. Doing so will improve the throughput of image data into the database because reading data from a disk drive is more efficient than reading data from a CD-ROM drive. However, before doing so, you must have allocated disk space to hold the image files. If there is not enough space to copy all the image files onto the local network, you can execute the mosaic in stages, reusing the staging space until all TIFF files are loaded.

3. Stage TIFF image files.


Make the source TIFF image files available by copying them onto a disk drive. The source data will not be mosaicked from the CD drive because the seek rate of that device is much slower than a local disk drive. There are some offline storage devices, such as tape silos (typically used by data vendors), capable of staging image files to an online cache, which provides a seek time better than that of a local disk drive. If you are using such a device at your location, you would not copy the data to a local disk drive; instead, you would stage the image files to the online cache.

If you have smaller areas or will not be impacting other users, you could copy the TIFF files into a single folder and complete the mosaic process in a single operation. However, if you have considerably large areas, you may want to break up the mosaic operation into several runs performed during the evening so as not to negatively impact users during peak hours. To do this, you may want to copy the TIFF files for each mosaic operation into a separate folder. You can use Python scripting to set up continuous mosaicking operations.
Learn more about scripting.

4. Create the raster dataset.


There are a number of ways to create raster datasets. The more common method is to use the Create Raster Dataset tool from ArcGIS Desktop. You could also create raster datasets using ArcObjects. As a rule of thumb, if you intend to use geoprocessing tools on a raster dataset, create and maintain them with ArcCatalog or ArcObjects scripts rather than tools that are not geodatabase aware.

To create a raster dataset using the tool, you should know the pixel depth (for example, 8-bit unsigned, 32-bit float), the number of bands, the coordinate system, the keyword that was set up with the DBMS storage parameters, and its resampling method to build pyramids (as its progress could be viewed with ArcMap once the load was under way); choose the compression; and set the pyramid reference point. You can decide on the compression method by storing various samples of the TIFF source files using a variety of compression combinations.

With a TIFF file, you generally have to create a raster dataset with three bands and 8-bit data.

5. Mosaic the source data into the raster dataset.



There are two mosaic tools that can be used to mosaic raster datasets into a single raster dataset: the Mosaic tool and the Workspace To Mosaic tool. The Workspace To Mosaic tool is recommended when mosaicking several hundred raster datasets.
When loading raster datasets, remember that raster datasets can only be loaded in serial fashion by mosaicking rasters into the dataset one after the other.

6. Build the DBMS statistics.



All DBMS systems supported by ArcSDE use cost-based optimization, which relies on statistics about the DBMS objects to select the best execution plan. If the statistics are not present, the DBMS will fall back on the predecessor rule-based optimization to select the execution plan. The results of rule-based optimization can often be unpredictable, so it is important to generate DBMS statistics.
The raster dataset mosaic operation is unique because it is one of the few processes that actually queries and inserts into the same table. This occurs because the mosaic operation must replace parts of the base-level pyramid blocks that are overlapped by the incoming image file pixels, and it must read all blocks of the preceding pyramid level to create or replace the next higher pyramid level. To efficiently read from the raster blocks table, the mosaic operation must use the composite index that has been created for that table.

If the DBMS statistics on the raster blocks table are not present, it is possible that the DBMS will repeatedly execute the expensive, full-table scan of the raster blocks table rather than the efficient and desirable index range scan of the raster blocks table composite index. To ensure that this does not occur, you should create DBMS statistics on the raster dataset after you have mosaicked approximately the first 10 image files into the raster dataset.

There are several ways to generate DBMS statistics:



7. Build raster statistics.



Raster statistics can be generated for the raster datasets and are stored in the ArcSDE AUX table. Raster statistics include the minimum, maximum, mean, and standard deviation of the pixel values. Generating raster statistics on the base pyramid level can require a significant amount of time for very large raster datasets.

Raster statistics are necessary for data that must be statistically stretched before the objects captured in the raster can be viewed by the human eye. When viewed without any statistical enhancement, the data can appear to be black mottled with a bit of dark gray. However, not all data requires raster statistics, since the source images may have been color corrected by the data vendor. For data that has been adjusted to suit the requirements of a display renderer, the application of raster statistics can cause the quality of the data to appear incorrect when displayed.

To generate statistics, you can right-click the raster dataset in ArcCatalog and click Calculate Statistics, or you can use the Calculate Statistics tool.

Learn more about raster dataset statistics.


Creating a raster catalog with TIFF files


In this example, there are several hundred TIFF image files stored on a CD that must be mosaicked into a raster dataset. The basic workflow to create and populate the raster dataset is as follows:

  1. Configure the system.
  2. Allocate disk space.
  3. Stage the source data.
  4. Create the raster catalog.
  5. Insert the source data into the raster catalog.
  6. Build DBMS statistics.

1. Configure the geodatabase server system.


The basic components of a server system include the network, operating system of the server and the client machines, DBMS, ArcSDE server, and choice of ArcSDE connection architecture. The following are suggestions for configuring each component.

Configuring the network
The network should be able to handle a large amount of raster data written from the client machines to the server without impacting other users on the system. In some cases, the network administrators may have to upgrade the network or run the raster-loading programs in the evenings and over the weekends during off-peak hours.

Configuring the operating system
Configure the maximum allowable operating system shared memory segment and allocate the required amount of memory to the DBMS data and log buffers.

Configuring the DBMS
The basic goal is to increase the data throughput into the database as much as possible; therefore, whenever possible, reconfigure the DBMS to efficiently load large amounts of raster data, optimizing it for write access.

Configuring ArcSDE
Two separate steps should be considered when configuring ArcSDE:


2. Allocate DBMS storage space for the raster catalog.


Beyond allocating disk space to the basic database transaction log files, temporary sort space, and data dictionary storage that occurs whenever the DBA creates a DBMS database, the DBA must also allocate space for the storage of the raster dataset. Enough disk space must be provided for the database table spaces, and enough disk space must be set aside to stage the image files on the file system.
You need to consider both the storage arrangement of the tables and indexes of the rasters and the physical size of the operating system files assigned to the database when constructing the DBMS storage space for the raster datasets.
It is highly recommended that you allocate DBMS storage space for the raster dataset because it eliminates the overhead of having the database allocate the space during the loading process. The operating system, database, or hardware problems can be identified and fixed prior to loading the raster data, and for some DBMSs, the storage of data is balanced across the file system if all the space is available initially.

Allocating space for TIFF image files
It is recommended that you copy the TIFF image files from the CDs to a disk drive on a local network. Doing so will improve the throughput of image data into the database because reading data from a disk drive is more efficient than reading data from a CD-ROM drive. However, before doing so, you must have allocated disk space to hold the image files. If there is not enough space to copy all the image files onto the local network, you can execute the mosaic in stages, reusing the staging space until all TIFF files are loaded.

3. Stage TIFF image files.


Make the source TIFF image files available by copying them onto a disk drive. The source data should not be mosaicked from a CD or DVD drive because the seek rate of these devices is much slower than a local disk drive. There are some offline storage devices, such as tape silos (typically used by data vendors), capable of staging image files to an online cache, which provides a seek time better than that of a local disk drive. If you are using such a device at your location, you would not copy the data to a local disk drive; instead, you would stage the image files to the online cache.

With a large amount of data, you may want to create several folders, grouping the image data for each insert operation.

You can use Python scripting to set up continuous loading operations.
Learn more about scripting.

4. Create the raster catalog.


Create the raster catalog using the Create Raster Catalog tool.

To create the raster catalog, you should know the coordinate system for the raster column and spatial column and the configuration keyword (if one was set).

5. Insert the source data into the raster catalog.



There are two loading tools that can be used to load raster datasets into a raster catalog: the Copy Raster tool and the Workspace To Raster Catalog tool. The Workspace To Raster Catalog tool is recommended when loading several hundred raster datasets.
Raster catalogs can be loaded in parallel fashion so multiple client machines can be leveraged to load images. Therefore, you can start ArcCatalog on multiple machines and run a raster loading tool from each machine.

6. Build the DBMS statistics.



All DBMS systems supported by ArcSDE use cost-based optimization, which relies on statistics about the DBMS objects to select the best execution plan. If the statistics are not present, the DBMS will fall back on the predecessor rule-based optimization to select the execution plan. The results of rule-based optimization can often be unpredictable, so it is important to generate DBMS statistics.
If the DBMS statistics are not present on the raster blocks table, it is possible that the DBMS will repeatedly execute the expensive, full-table scan on the raster blocks table rather than the efficient and desirable index range scan on the raster blocks table composite index. To ensure that this does not occur, you should create DBMS statistics on the raster blocks table once it contains at least 10,000 records.

Please visit the Feedback page to comment or give suggestions on ArcGIS Desktop Help.
Copyright © Environmental Systems Research Institute, Inc.