Spatial data describes information related to the space occupied by objects. The data consists of geometric information and can be either discrete or continuous. Discrete data might be a single point in multi-dimensional space, however discrete spatial data differs from non-spatial data in that it has a distance attribute that is used to locate the data in space. Continuous data spans a region of space. This data might consist of medical images, map regions, or star fields [Sam94]. 数据挖掘研究院
Spatial databases are database systems that manage spatial data. They are designed to handle both spatial information and the non-spatial attributes of that data. In order to provide efficient and effective access to spatial data it is necessary to develop indices. These indices are most successful when based on multi-dimensional trees. The structures proposed for these indices include quad trees, k-d trees, R trees and R* trees [Sam94].
Data mining, or knowledge discovery in databases (KDD), is the technique of analyzing data to discover previously unknown information. The goal is to reveal regularities and relationships that are non-trivial. This is accomplished through an analysis of the patterns that form in the data. Various algorithms have been developed to perform this analysis, but many of these techniques are not scalable to very large databases. 数据挖掘研究院
Spatial data mining differs from regular data mining in parallel with the differences between non-spatial data and spatial data. The attributes of a spatial object stored in a database may be affected by the attributes of the spatial neighbors of that object. In addition, spatial location, and implicit information about the location of an object, may be exactly the information that can be extracted through spatial data mining. [Fay96]
In order to successfully explore the massive amounts of spatial data being collected it is necessary to develop database primitives to manipulate the data [EFKS00]. The indices developed for spatial databases are also necessary to provide effective search mechanisms. However, the very large size of spatial databases also requires additional techniques for manipulating and cleaning the data in order to prepare it for analysis. Three methods that have been proposed and developed to aid in the preparation of data are spatial characterization, spatial classification, and clustering.
Spatial characterization of an object is the description of the spatial and non-spatial attributes of the object that are typical of similar objects but not necessarily typical of the database as a whole [EFKS98]. To obtain a spatial characterization of an object it is necessary to look at both the properties of the object itself and the properties of its neighbors. The goal of spatial characterization is to discover a set of tuples where a particular set of types appears with a frequency that is significantly different from the frequency in the database as a whole. However, if the neighborhood is very small, that is there are very few targets, then spatial characterization may produce misleading results. It is therefore necessary that there be a significant difference in a large target neighborhood. It is interesting to note that an attribute may be significant in a limited neighborhood, but when the neighborhood is expanded the property will no longer be significant.
Spatial trend detection is the regular change of one or more non-spatial attributes while moving on the spatial plane from point x to point y. The regularity of the change is described by performing a regression analysis on the respective attribute values for the objects on the path. 数据挖掘研究院

