Show Navigation | Hide Navigation

Tiled processing of large datasets
Release 9.3 Last modified August 3, 2009

Why subdivide the data?

The overlay analysis tools perform best when processing can be done within your machine's physical memory (or RAM). This may not always be possible when working with datasets that contain either a large number of features or very complex features that contain hundreds of thousands or millions of vertices. Previously, when the physical memory was exhausted virtual memory was used, and when it was exhausted an internal paging system was used. Each successive mode of memory management is slower than the previous by an exponential factor.

How do I know when the process was tiled?

Review the messages returned by a tool during or after execution to determine if the input data was tiled. The third line will state "Processing Tiles..." if adaptive subdivision processing occurs, otherwise the input data was not subdivided and the fourth line will state "Cracking Features..."
Example of the messages from a process that was not subdivided.

Executing (Identity_1): Identity c:\gp\fgdb.gdb\rivers c:\gp\fgdb.gdb\pf_watersheds c:\gp\fgdb.gdb\rivers_ws
Reading Features...
Cracking Features...
Assembling Features...
Executed (Identity_1) successfully.

Example of the messages from a process that was subdivided.

Executing (Identity_1): Identity c:\gp\fgdb.gdb\rivers c:\gp\fgdb.gdb\pf_watersheds c:\gp\fgdb.gdb\rivers_ws
Reading Features...
Processing Tiles...
Assembling Features...
Executed (Identity_1) successfully.

What do the tiles look like?

Every process starts with a single tile which spans the entire extent of the data. If the data in the single tile is too large to be processed in physical memory, it is subdivided into four equal tiles (using a quadtree approach). Processing then begins on a sub-tile, which is further sub-divided if the data in this second level of tiles is again too large. This continues until the data within each tile can be processed within physical memory. See the example below.

Extent of input datasets

The footprint of all the input features.

GP tile level 1

The process begins with a tile that spans the entire extent of all datasets. For reference this is called tile level 1.

GP tile level2

If the data is too large to process in memory, the level 1 tile is subdivided into four equal tiles. These 4 sub-tiles are called level 2 tiles.

GP tiles adaptive

Based on the size of data in each tile, some tiles are further subdivided, while others are not.

The tiles are output to the following shapefile c:\Documents and Settings\UserName\Local Settings\Temp\OverlayTile.shp upon completion of a process that required tiling.

Which tools use subdivisions

The following tools from the "Analysis Tools Toolbox" have subdivision logic when dealing with large data.
-Clip
-Erase
-Identity
-Intersect
-Union
-Split
-Symmetrical Difference

Process fails with an "Out of memory" error

The subdivision approach will not help process extremely large features. These are features with many millions of vertices. Splitting and reassembling extremely large features multiple times across tile boundaries is very costly in terms of memory, and may cause "Out of memory" errors if the feature is too large. It is recommended that these features be broken up into smaller features. Road casing for an entire city or a polygon representing a river estuary are examples of very large features with many vertices.
The "Out of memory" error could also happen if a second application is run while a tool is processing. This second application could adjust the available amount of physical memory and render the boundary calculation of the currently processing tile as incorrect, thereby causing the tool process to demand more physical memory than is possible. It is recommended that no other operations be performed on a machine while overlay processing large datasets.

What data format is recommended when working with large data?