RSS
热门关键字:  数据挖掘  人工智能  数据仓库  搜索引擎  数据挖掘导论

Data sparsity : OLAP by example

来源: 作者:unkonwn 时间:2004-12-11 点击:

 

That′s it! The Best Foot Forward company, thanks to its rigorous management, is now very well established, its choice of shoes is absolutely marvelous. More than 12000 references are available, and more of 500 shops have opened through the country. Each of them sends its daily sale figures to the home office. To the end of a year, one obtains therefore 12000*500*365 cells, that is to say more than 2 billion of potential cells for the ′Number′ variable. 数据挖掘研究院

However, each outlet does not offer every item in the catalogue to the customers. On the average, approximately 600 references are available in each store, opened only 250 days per year. Therefore, there will be really only 600*500*250 filled cells, or, to put in another way, 75 millions of cells (25 time less than the potential size). 数据挖掘研究院

This means that data are not randomly distributed to the defined variable in our OLAP database. For example, the ′Paris Bastille′ outlet never sells shoes with "23154" reference. For every day of the year, the ′Number′ measure will therefore be empty for this outlet and this reference. 数据挖掘研究院

All dimensions of a variable do not have, therefore, the same importance. To optimize disk space management and access to data, a good way is to specify to the OLAP engine what the characteristics of a variable are. 数据挖掘研究院

With Oracle Express or Essbase, we just have to specify which dimensions are dense and sparse. In our example, the time dimension is dense, reference and outlet are sparse. The system automatically optimize data management. For example, Oracle Express create a composite dimension, with each significant pair of reference-outlet. This specific dimension should be manually maintained, in this case it is called a conjoint dimension. 数据挖掘研究院

For the OLAP engine, this measure now has only two dimensions. For the user and the developer, this technical consideration is transparent.

数据挖掘研究院

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?