RSS
热门关键字:  数据挖掘  人工智能  数据仓库  搜索引擎  数据挖掘导论

A Decomposition-Based Simulated Annealing Technique for Data

来源: 作者:unkonwn 时间:2004-12-10 点击:

1 Introduction
Terabyte online databases, consisting
records, are becoming common as the
of millions of
price of online
storage decreases. As a result, efficient data clustering
plays an increasingly significant role in enhancing the
performance of database systems. The objective of
clustering related data items on storage devices is to
minimize the cost of query processing. Essentially,
data clustering attempts to place related data, which
are required consecutively on some query access path,
physically close to each other on the storage to minimize the number of disk accesses.
The problem of data clustering can be formulated as
a graph partitioning problem (GPP). A database can be
represented as a graph G = (N, A) in which
l
b
b
l
l
N denotes the set of nodes representing the data
items;
A denotes the set of edges which connect related
data items if they are referenced consecutively in a


query;
a weight Wj > 1 is assigned to each arc a~ G A; the
weight represents the relative significance (e. g., the
frequency) of coreferencing the related data items
connected by the edge;
a partitioning divides N approximately evenly into
v disjoint subgraphs N1, N2, . . . . N., where v is a
given parameter; we assume that each subgraph
represents a segment (e.g., page, track, cylinder,
etc. ) with a fixed capacity, and the size of a data
item is also fixed;
a cost function C is defined as ~ Wj where the
summation ranges over those arcs aj G A which
connect nodes in two different subgraphs N* and Nt,
(1 S i,i’ <v, and i # i’).
We note that the nodes can be the objects of
00DBs (object-oriented databases) or tuples of RDBs
(relational databases); the edges represent the aggregate
(is-part-of) relationships or the primary-key-foreign-key
(joining) links between the two data items represented
by the two end nodes of each edge. The goal of data

数据挖掘研究院


clustering is to find a partitioning of the graph G such
that the cost function C = ~ Wj is minimized. 数据挖掘实验室

  数据挖掘研究院

  数据挖掘研究院

资料全文下载

数据挖掘研究院

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?