Clustering is a descriptive task that seeks to identify homo- geneous groups of objects based on the values of their at- tributes (dimensions) [24] [25]. Clustering techniques have been studied extensively in statistics [3], pattern recogni- tion [11] [19], and machine learning [9] [31]. Recent work in the database community includes CLARANS [33], Focused CLARANS [14], BIRCH [45], and DBSCAN [13]. Current clustering techniques can be broadly classified into two categories [24] [25]: partitional and hierarchical. Given a set of objects and a clustering criterion [39], parti- tional clustering obtains a partition of the objects into clus- ters such that the objects in a cluster are more similar to each other than to objects in di erent clusters. The popular K-means and K-medoid methods determine K cluster rep- resentatives and assign each object to the cluster with its representative closest to the object such that the sum of the distances squared between the objects and their represen- tatives is minimized. CLARANS [33], Focused CLARANS [14], and BIRCH [45] can be viewed as extensions of this approach to work against large databases. Mode-seeking clustering methods identify clusters by searching for regions in the data space in which the object density is large. DB- SCAN [13] finds dense regions that are separated by low density regions and clusters together the objects in the same dense region. A hierarchical clustering is a nested sequence of parti- tions. An agglomerative, hierarchical clustering starts by placing each object in its own cluster and then merges these atomic clusters into larger and larger clusters until all ob- jects are in a single cluster. Divisive, hierarchical clustering reverses the process by starting with all objects in cluster and subdividing into smaller pieces [24].
Automatic Subspace Clustering of High Dimensional Data for D
来源:
作者:unkonwn
时间:2004-12-13
点击:
最新评论共有 0 位网友发表了评论
查看所有评论
发表评论
热点关注

