RSS
热门关键字:  数据挖掘  人工智能  数据仓库  搜索引擎  数据挖掘导论

The properties of the Sales table : OLAP by example

来源: 作者:unkonwn 时间:2004-12-11 点击:

 

In this first stage, we are going to try, step by step, to build a classic database with tables made of columns. We will see then that, in spite of the undeniable interest of relational databases, this approach is not always the best one.

The "Best Foot Forward" company wants to build a database to track the progress of its shoe sales, by style and by month. So far, so good. You can imagine a sales table like this:

数据挖掘研究院

数据挖掘研究院

Month 数据挖掘研究院

Style 数据挖掘研究院

Quantity

数据挖掘研究院

Total Value TE 数据挖掘研究院

January 2000

Ski boot

5 数据挖掘实验室

1 700 F 数据挖掘研究院

January 2000 数据挖掘研究院

Gumboot 数据挖掘研究院

300

65 000 F

数据挖掘研究院

.......... 数据挖掘实验室

 

 

  数据挖掘研究院

数据挖掘研究院

In reality, the different types of shoes are held in a Style table, and we only include the corresponding key field in the Sales table. 数据挖掘研究院

To analyze the data, one might, for example, put the months as rows and the styles in columns. Thus we would generate two tables Quantity and Total Value Tax Exclusive on which we can create the simulations and charts that we want. We have two simple reports : one giving the number of shoes per month and per style, the other giving the value according to the same criteria.

Footwear sales are running along nicely (OK, we′ve seen better), our company grows and now has multiple outlets. We must therefore develop our sales table:

Month 数据挖掘研究院

Style 数据挖掘研究院

Outlet 数据挖掘研究院

Quantity

Total Value TE

April 2000 数据挖掘实验室

Gumboot 数据挖掘研究院

Lyon

10 数据挖掘实验室

1 500 F

April 2000 数据挖掘研究院

Sneaker

数据挖掘研究院

Paris Bastille

850 数据挖掘研究院

260 000 F

数据挖掘实验室

..........

数据挖掘研究院

  数据挖掘研究院

  数据挖掘实验室

  数据挖掘研究院

  数据挖掘研究院

数据挖掘研究院

Here as well, a relational database will make use of an outlet table, with a name, a key field and other characteristics such as the address.

The analysis is already proving more complex, as a piece of paper (or a spreadsheet) only has two dimensions. If we want to study the performance of an outlet we must select it from here. But if we are subsequently interested in what happened in February in all the Outlets we have to start again and make another selection.

At this point, we already have six possible reports : 数据挖掘实验室

  • For each outlet or for all the outlets, the two preceding reports,
  • For each style or for the total, two reports with months in line (rows) and outlets in column,
  • For each month or for the year, the reports with outlets in line and style in column.

These six reports could be twelve if we wish to swap lines and columns. 数据挖掘研究院

We now realize that our company also wants to study the sales splits according to other criteria such as Gender (Men′s, Ladies′, Children′s), Size or perhaps Colour. 数据挖掘实验室

The first problem here is that everybody needs to use 1280 X 1024 resolution on their monitors to be able to see the sales table. As that has no intrinsic interest, we shall merely imagine it. This table has become very big, both in terms of columns and of rows. To analyze this data we must create totals of groups, resulting in response times that are not always acceptable - so we have to manage the indexes, etc.

数据挖掘研究院

Star

数据挖掘研究院

The relational database structure that we have just constructed is sometimes referred to as a star schema - a reference to the appearance of the corresponding conceptual model: The centre of the star is the sales table, whilst the branches are the tables for outlet, style, colour etc. Do you see the four branches of the star in the conceptual and physical data model below ?

Star 数据挖掘研究院

数据挖掘研究院

MCD

数据挖掘研究院

MPD 数据挖掘研究院

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?