Understanding geostatistical analysis
Last modified October 28, 2008
Print all topics in : "Geostatistical Analyst toolbox"
Generating a continuous surface used to represent a particular attribute is a key capability required in most geographic information system (GIS) applications. Perhaps the most commonly used surface type is a digital elevation model of terrain. These datasets are readily available at small scales for various parts of the world. However, just about any measurement taken at locations across a landscape, subsurface, or atmosphere can be used to generate a continuous surface. A major challenge facing most GIS modelers is to generate the most accurate surface possible from existing sample data as well as characterize the error and variability of the predicted surface. Newly generated surfaces are used in further GIS modeling and analysis as well as in 3D visualization. Understanding the quality of this data can greatly improve the utility and purpose of GIS modeling. This is the role of the Geostatistical Analyst toolbox.
Geostatistical Analyst uses sample points taken at different locations in a landscape and creates (interpolates) a continuous surface. The sample points are measurements of some phenomenon such as radiation leaking from a nuclear power plant, an oil spill, or elevation heights. Geostatistical Analyst derives a surface using the values from the measured locations to predict values for each location in the landscape.
Geostatistical Analyst provides two groups of interpolation techniques: deterministic and geostatistical. All methods rely on the similarity of nearby sample points to create the surface. Deterministic techniques use mathematical functions for interpolation. Geostatistics rely on both statistical and mathematical methods, which can be used to create surfaces and assess the uncertainty of the predictions.
In addition to providing various interpolation techniques, Geostatistical Analyst also provides many supporting tools. For example, prior to mapping, Exploratory Spatial Data Analysis (ESDA) tools can be used to assess the statistical properties of the data. Having explored the data, the user can then create a variety of output map types (for example, prediction, error of prediction, probability, and quantile) using many variants of kriging and cokriging algorithms (for example, ordinary, simple, universal, indicator, probability, and disjunctive) and associated tools (for example, data transformation, declustering, and detrending).
Generally speaking, things that are closer together tend to be more alike than things that are farther apart. This is a fundamental geographic principle. Suppose you are a town planner and you need to build a scenic park in your town. You have several candidate sites, and you may want to model their viewsheds at each location. This will require a more detailed elevation surface dataset for your study area. Suppose you have preexisting elevation data for 1,000 locations throughout the town. You can use this to build a new elevation surface.
When trying to build the elevation surface, you can assume that the sample values closest to the prediction location will be similar. But how many sample locations should you consider? And should all the sample values be considered equally? As you move farther away from the prediction location, the influence of the points will decrease. Considering a point too far away may actually be detrimental because the point may be located in an area that is dramatically different from the prediction location.
One solution is to consider enough points to give a good prediction but few enough points to be practical. The number will vary with the amount and distribution of the sample points and the character of the surface. If the elevation samples are relatively evenly distributed, and the surface characteristics do not change significantly across your landscape, you can predict surface values from nearby points with reasonable accuracy. To account for the distance relationship, the values of closer points are usually weighted more heavily than those farther away.
This is the basis for the Inverse Distance Weighted (IDW) interpolation technique. As its name implies, the weight of a value decreases as the distance increases from the prediction location.
A deterministic interpolation technique applies an established mathematical formula to the sample points. In the case of IDW, the formula is to multiply the values of the points that fall within a specified neighborhood from the processing cell by a weight that is derived from the distance the sample point is from the processing location.
The following are the deterministic methods available in Geostatistical Analyst:
A second family of interpolation methods consists of geostatistical methods that are based on statistical models that include autocorrelation (statistical relationships among the measured points). Not only do these techniques have the capability of producing prediction surfaces, but they can also provide some measure of the accuracy of these predictions.
Kriging is similar to Inverse Distance Weighted interpolation in that it weights the surrounding measured values to derive a prediction for each location. However, the weights are based not only on the distance between the measured points and the prediction location but also on the overall spatial arrangement among the measured points. To use the spatial arrangement in the weights, the spatial autocorrelation must be quantified.