You are here:
Geoprocessing tool reference
Input and output data considerations
This topic was updated for 9.3.1.
Over the years, ESRI has developed three main data formats for storing geographic information; coverages, shapefiles, and geodatabases. Shapefiles were developed to provide a simple, nontopological format for storing geographic and attribute information. Because of the simplicity of shapefiles, they are a very popular open data transfer format. While shapefiles may seem to be an easy choice because of their simplicity, there are limitations in their use that geodatabases address. When using shapefiles, you should be aware of their limitations. In broad general terms
- Geographic data is more than the simple features and attributes that a shapefile can store. For example, there are annotation, attribute relationships, topology relationships, attribute domains and subtypes, coordinate precision and resolution, and numerous other capabilities that are supported in geodatabases but not in shapefiles.
- Because shapefiles are an open format popular for data transfer, many non-ESRI software packages output shapefiles. (You can find the shapefile format specification at http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf.) Unfortunately, these packages do not always do a good job of creating properly formatted shapefiles. You may have already experienced the frustration of receiving corrupted shapefiles from another source.
- Shapefiles make use of the dBASE file format (.dbf file) to store attributes. DBASE is a non-ESRI format developed in the early 1980s and was, at that time, the most popular format for storing tables of attributes. However, time has passed them by, and there have been a number of data representation improvements since then such as the Unicode standard to support most of the world's writing systems. This is one reason why shapefiles do not work well for storing information in a language other than English.
These issues (and more) mean that shapefiles are a poor choice for active database management—they do not handle the modern life cycle of data creation, editing, versioning, and archiving.
When should I use a shapefile?
- When exporting data for use in a non-ESRI software application.
- When exporting data for use in ArcView 3 or ArcInfo Workstation.
- When you need to write simple features and attributes quickly such as for ArcGIS Server geoprocessing services. But be aware of the limitations as detailed below.
When should I not use a shapefile?
With some exceptions that are noted below, shapefiles are acceptable for storing simple feature geometry. However, shapefiles have serious problems with attributes. For example, they cannot store null values, they round up numbers, they have poor support for Unicode character strings, they do not allow field names longer than 10 characters, and they cannot store both a date and time in a field. These are just the main issues. Additionally, they do not support capabilities found in geodatabases such as domains and subtypes. So unless you have very simple attributes and no geodatabase capabilities, do not use shapefiles.
Shapefile components and file extensions
Shapefiles are stored in three or more files that all have the same prefix and are stored in the same system folder (shapefile workspace). You will see the individual files when viewing the folder in Windows Explorer, not in ArcCatalog.
||The main file that stores the feature geometry. No attributes are stored in this file—only geometry.
||A companion file to the .shp that stores the position of individual feature IDs in the .shp file.
||The dBASE table that stores the attribute information of features.
|.sbn and .sbx
||Files that store the spatial index of the features.
||Created for each dBASE attribute index created in ArcCatalog.
|.ixs and .mxs
||Geocoding index for read/write shapefiles.
||The file that stores the coordinate system information.
||Metadata for ArcGIS—stores information about the shapefile.
- There is a 2 GB size limit for any shapefile component file, which translates to a maximum of roughly 70 million point features. The actual number of line or polygon features you can store in a shapefile depends on the number of vertices in each line or polygon (a vertex is equivalent to a point).
- Shapefiles do not contain an XY tolerance like geodatabase feature classes. The XY tolerance is the minimum distance between coordinates before they are considered equal. This XY tolerance is used when evaluating relationships between features within the same feature class or between several different feature classes. It is also used extensively when editing features. If you are performing any sort of operation involving comparison between features, such as use of Overlay tools, the Clip tool, the Select Layer By Location tool, or nearly every tool that takes two or more feature classes as input, you should be using geodatabase feature classes (which have an XY tolerance) rather than shapefiles.
- A shapefile may take up three to five times as much space as a file geodatabase or SDE because of shape compression methods.
- Shapefiles support multipatches, but lack support for the following advanced multipatch capabilities:
- Texture coordinates
- Textures and part color
- Lighting normals
- The spatial index for a shapefile is inefficient compared to that of a geodatabase feature class. This means that spatial queries (such as selecting features within a polygon) take longer compared to a geodatabase feature class. This inefficiency is only noticeable when dealing with large numbers of features.
- Circular arc curves are not supported on shapefiles. Circular arc curves are created by editing geodatabase feature classes, as described in Creating segments that are circular arc curves. Circular arc curves use a mathematical formula to draw the curve. If you export a geodatabase feature class containing circular arc curve features to a shapefile, the curved features are transformed to simple line features with closely spaced vertices to capture the curved shape.
- Shapefiles do not have a spatial domain, which defines the geographic extent that all coordinates must fall within. This spatial extent is useful when editing geometry since it prevents you from entering coordinates outside the extent.
- Unlike other formats, numeric attributes are stored in character format rather than binary format. For real numbers (that is, numbers containing decimal places), this may lead to rounding errors. (This limitation does not apply to shape coordinates, only attributes.) The following table summarizes the field width for each data type.
- The dBASE file standard only supports ANSI characters in their field names and values. ESRI has added extensive Unicode support for dBASE files to allow you to store Unicode field names and values. But this additional support resides only in ArcGIS and not in non-ESRI applications. Supporting Unicode in dBASE is an ongoing effort at ESRI, meaning that issues continue to be found and resolved.
NOTE: If you have to support Unicode in your field names or field values, we strongly suggest that you use geodatabases rather than shapefiles.
- Date fields support either the date or the time, but not both in the same field.
- Null values are not supported in shapefiles. If a feature class containing nulls is converted to a shapefile, then the null values will be changed into the following:
|Number—When tool requires a NULL, infinity, or NaN (Not a
Number) to be output
(IEEE standard for the maximum negative value)
|Number (all other geoprocessing tools)
||" " (blank—no space)
||Stored as zero, but displays "<null>"
- Field names cannot be longer than 10 characters.
- The maximum record length for an attribute is 4,000 bytes. The record length is the number of bytes used to define all the fields, not the number of bytes used to store the actual values.
- The maximum number of fields is 255. A conversion to shapefile will convert the first 255 fields if this limit is exceeded.
- The dBASE file must contain at least one field. When you create a new shapefile or dBASE table, an integer ID field is created as a default.
- dBASE files do not support type blob, guid, global ID, coordinate ID, or raster field types.
- dBASE files have little SQL support aside from a WHERE clause.
- Attribute indexes are deleted when you save edits, and you must re-create them from scratch.
Shapefiles have no extended data types at either the workspace or feature class level. Any conversion to shapefile from a geodatabase feature class or other format will result in the loss of the following:
Shapefiles and geoprocessing
- Attribute domains
- Geometric networks
Any geoprocessing tool that outputs a feature class allows you to choose either a shapefile or geodatabase feature class as the output format. Similarly, a tool that outputs a table allows you to choose either a dBASE file (.dbf) or a geodatabase table as the output. You should always be aware of which format you use and the consequences of converting a geodatabase input to a shapefile output.
Geoprocessing tools auto-generate an output feature class or table for you. This autogenerated output is based on a number of factors as described in Specifying tool inputs and outputs. If your scratch workspace environment is set to a system folder, and not a geodatabase, the auto-generated output features class will be a shapefile or dBASE file, as illustrated below.
It is suggested that you set your scratch workspace to a file geodatabase so that the auto-generated output is written to a file geodatabase, not a shapefile or .dbf table.
Learn more about geoprocessing environments.
Because shapefiles write quickly, they are often used to write intermediate data in models since this makes for faster model execution. However, writing to a file geodatabase is almost as fast as writing to a shapefile, so unless execution speed is critical, you should always use a file geodatabase for intermediate and output data. If you do use shapefiles, be aware of their limitations as described above and only use shapefiles for simple features and attributes. An alternative to using shapefiles for intermediate data is to write features to the in_memory workspace.
Learn more about writing features to the in_memory workspace.