The Star Schema Data Warehouse Design

Spectrum™ Technology Platform supports the creation and maintenance of data warehouses that use a star schema design. In a star schema, data is stored as either facts, which are specific descriptions of an event, or dimensional attributes, which are descriptions of the facts in the fact table. Facts change regularly and dimensions change slowly or never.

The following illustration shows the design of a star schema:



This illustration shows the main characteristics of a star schema: a fact table, dimension tables, and joins.

Fact Table

Fact tables are the central tables in the star schema of your data warehouse. Fact tables usually contain numeric or quantitative information (called measures) that describe a specific event. For example, if you have a data warehouse that you use to generate a report on company revenue, you would have dollar_sales, and dollar_cost as columns within your fact table, as shown in the illustration above. Typically, facts are continuously valued and additive. "Continuously valued" means that the fact is a numeric measurement that has a value every time it is measured. "Additive" means that the fact can be summarized through addition.

Fact tables also contain a set of columns that form a concatenated, or composite key. Each column of the concatenated key is a foreign key drawn from a dimension table's primary key. For example, in the above illustration the fact table contains a column product_key which associates the fact with a specific product in the product_dimension table.

The level of detail in a fact table is called the grain. Every row in the fact table must be recorded to the same level of detail. In the diagram above, the measurements in the fact table are daily totals of sales in dollars, sales in units, and cost in dollars of each product sold. The grain is daily. Each record in the fact table represents the total sales of a specific product in a retail store on one day. Each new combination of product, store, or day generates a different record in the fact table.

Fact tables are populated with data extracted from a data source. The data source can be an OLTP system or a data warehouse. Spectrum™ Technology Platform takes a snapshot of the source data on a regular schedule and moves the data to the data warehouse, usually at the same time every day, week or month.

A star schema can have multiple fact tables. Use a schema with multiple fact tables to separate sets of measurements that share a common subset of dimension tables, or to track measurements with different grains.

Dimension Table

Dimension tables store data that describe the information in the fact table. For example, if sales_total differed one month from the next you would look to the dimensions to tell you why. The same dimension table can be used with different fact tables.

Dimension tables have attributes and a single part primary key that joins the dimension table to the fact table. Attributes are the columns in the dimension table. The single part primary key allows you to quickly browse a single dimension table. Browsing a dimension table can help determine the best way to query the fact table.

Time dimension tables are necessary for accurate time-based calculations because you sometimes cannot easily extract the necessary date data from the records. For example, the following records are in a sales database. Note that there are time gaps between records. For example, there is no record for the day 1/4/2012.

Date Product Amount
1/3/2012 Red Shirt $10.00
1/5/2012 Red Shirt $5.00
1/7/2012 Red Shirt $15.00

If you query these records and calculate the average sales per day, the answer would be $10.00 ($30 / 3 records). However, this is incorrect because the three records actually span a period of five days. If you have a time dimension table with a record for each day, you could join that table with the above table to get this:

Date Product Amount
1/3/2012 Red Shirt $10.00
1/4/2012    
1/5/2012 Red Shirt $5.00
1/6/2012    
1/7/2012 Red Shirt $15.00

Calculating the average sales per day using these records, you would get the correct answer: $6.00 ($30 / 5 days).

In addition, you could account for arbitrary time attributes such as holidays, weekends, and quarters in your calculation. For example, if 1/6/2012 happened to be a holiday and you were only interested in average sales per workday then the answer would be $7.50.

Joins

Joins define relationships between a fact table and dimension tables in the star schema. The primary key in the dimension table is the foreign key in the fact table. The fact table must contain a primary key value from each dimension table. The reference from the foreign key to the primary key is the mechanism for verifying values between the two tables. Join relationships of this type ensure the referential integrity of a data warehouse. Referential integrity must be maintained to ensure valid query results.

Each record in a dimension table can describe many records in the fact table, making the join cardinality of dimension tables to fact tables one-to-many.

In the illustration above, product_key is the primary key in the product_dimension table and the foreign key in the sales_fact table. This join represents the relationship between the company’s products and its sales.