In this first stage, we are going to try, step by step, to build a classic database with tables made of columns. We will see then that, in spite of the undeniable interest of relational databases, this approach is not always the best one.
|
|
The "Best Foot Forward" company wants to build a database to track the progress of its shoe sales, by style and by month. So far, so good. You can imagine a sales table like this: |
|
Month 数据挖掘研究院 |
Style 数据挖掘研究院 |
Quantity |
Total Value TE 数据挖掘研究院 |
|
January 2000 |
Ski boot |
5 数据挖掘实验室 |
1 700 F 数据挖掘研究院 |
|
January 2000 数据挖掘研究院 |
Gumboot 数据挖掘研究院 |
300 |
65 000 F |
|
.......... 数据挖掘实验室 |
|
|
In reality, the different types of shoes are held in a Style table, and we only include the corresponding key field in the Sales table. 数据挖掘研究院
To analyze the data, one might, for example, put the months as rows and the styles in columns. Thus we would generate two tables Quantity and Total Value Tax Exclusive on which we can create the simulations and charts that we want. We have two simple reports : one giving the number of shoes per month and per style, the other giving the value according to the same criteria.
Footwear sales are running along nicely (OK, we′ve seen better), our company grows and now has multiple outlets. We must therefore develop our sales table:
|
Month 数据挖掘研究院 |
Style 数据挖掘研究院 |
Outlet 数据挖掘研究院 |
Quantity |
Total Value TE |
|
April 2000 数据挖掘实验室 |
Gumboot 数据挖掘研究院 |
Lyon |
10 数据挖掘实验室 |
1 500 F |
|
April 2000 数据挖掘研究院 |
Sneaker |
Paris Bastille |
850 数据挖掘研究院 |
260 000 F |
|
.......... |
Here as well, a relational database will make use of an outlet table, with a name, a key field and other characteristics such as the address.
The analysis is already proving more complex, as a piece of paper (or a spreadsheet) only has two dimensions. If we want to study the performance of an outlet we must select it from here. But if we are subsequently interested in what happened in February in all the Outlets we have to start again and make another selection.
At this point, we already have six possible reports : 数据挖掘实验室
- For each outlet or for all the outlets, the two preceding reports,
- For each style or for the total, two reports with months in line (rows) and outlets in column,
- For each month or for the year, the reports with outlets in line and style in column.
These six reports could be twelve if we wish to swap lines and columns. 数据挖掘研究院
We now realize that our company also wants to study the sales splits according to other criteria such as Gender (Men′s, Ladies′, Children′s), Size or perhaps Colour. 数据挖掘实验室
The first problem here is that everybody needs to use 1280 X 1024 resolution on their monitors to be able to see the sales table. As that has no intrinsic interest, we shall merely imagine it. This table has become very big, both in terms of columns and of rows. To analyze this data we must create totals of groups, resulting in response times that are not always acceptable - so we have to manage the indexes, etc.
|
|
The relational database structure that we have just constructed is sometimes referred to as a star schema - a reference to the appearance of the corresponding conceptual model: The centre of the star is the sales table, whilst the branches are the tables for outlet, style, colour etc. Do you see the four branches of the star in the conceptual and physical data model below ? |
|
|

