RSS
热门关键字:  数据挖掘  数据仓库  人工智能  搜索引擎  数据挖掘导论

Clustering Algorithms for Spatial Databases: A Survey

来源: 作者:unkonwn 时间:2004-11-30 点击:

Spatial data describes information related to the space occupied by objects. The data consists of geometric information and can be either discrete or continuous. Discrete data might be a single point in multi-dimensional space, however discrete spatial data differs from non-spatial data in that it has a distance attribute that is used to locate the data in space. Continuous data spans a region of space. This data might consist of medical images, map regions, or star fields [Sam94]. 数据挖掘研究院

Spatial databases are database systems that manage spatial data. They are designed to handle both spatial information and the non-spatial attributes of that data. In order to provide efficient and effective access to spatial data it is necessary to develop indices. These indices are most successful when based on multi-dimensional trees. The structures proposed for these indices include quad trees, k-d trees, R trees and R* trees [Sam94].

数据挖掘实验室

Data mining, or knowledge discovery in databases (KDD), is the technique of analyzing data to discover previously unknown information. The goal is to reveal regularities and relationships that are non-trivial. This is accomplished through an analysis of the patterns that form in the data. Various algorithms have been developed to perform this analysis, but many of these techniques are not scalable to very large databases. 数据挖掘研究院

Spatial data mining differs from regular data mining in parallel with the differences between non-spatial data and spatial data. The attributes of a spatial object stored in a database may be affected by the attributes of the spatial neighbors of that object. In addition, spatial location, and implicit information about the location of an object, may be exactly the information that can be extracted through spatial data mining. [Fay96]

In order to successfully explore the massive amounts of spatial data being collected it is necessary to develop database primitives to manipulate the data [EFKS00]. The indices developed for spatial databases are also necessary to provide effective search mechanisms. However, the very large size of spatial databases also requires additional techniques for manipulating and cleaning the data in order to prepare it for analysis. Three methods that have been proposed and developed to aid in the preparation of data are spatial characterization, spatial classification, and clustering.

Spatial characterization of an object is the description of the spatial and non-spatial attributes of the object that are typical of similar objects but not necessarily typical of the database as a whole [EFKS98]. To obtain a spatial characterization of an object it is necessary to look at both the properties of the object itself and the properties of its neighbors. The goal of spatial characterization is to discover a set of tuples where a particular set of types appears with a frequency that is significantly different from the frequency in the database as a whole. However, if the neighborhood is very small, that is there are very few targets, then spatial characterization may produce misleading results. It is therefore necessary that there be a significant difference in a large target neighborhood. It is interesting to note that an attribute may be significant in a limited neighborhood, but when the neighborhood is expanded the property will no longer be significant.

Spatial trend detection is the regular change of one or more non-spatial attributes while moving on the spatial plane from point x to point y. The regularity of the change is described by performing a regression analysis on the respective attribute values for the objects on the path. 数据挖掘研究院

资料全文下载 数据挖掘研究院

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?