[BOOK REVIEW] The Data Warehouse Toolkit by Ralph Kimball/Retail Case Study

5 min readMar 23, 2022

On how to design dimensional modeling in retail industry based on a case study.

Case study

Working at headquarter of a large grocery chain with around 100 grocery stores cross 5 states. Each store has grocery, frozen foods, dairy, meat, produce, bakery, floral, and healthy/beauty aids departments. Each store has 60,000 stock keeping units (SKUs) a.k.a products on shelves.

At the grocery store, concern was put on the logistics of ordering,
stocking, and selling products while maximizing proﬁt. The proﬁt ultimately comes from charging as much as possible for each product, lowering costs for product acquisition and overhead, and at the same time attracting as many customers as possible in a highly competitive environment. Some of the most signiﬁcant management decisions have to do with pricing and promotions. (Promotions in a grocery store include temporary price reductions, ads
in newspapers and newspaper inserts, displays in the grocery store, and coupons). A rather common strategy for instance put toilet paper on sales for 50% to get a sale jump by a factor of 10, but the towels probably are being sold at a loss. As a result, the visibility of all forms of promotion is an important part of analyzing the operations of a grocery store.

Data is collected from cash registers as customers purchase products as in figure below.

Pre-design dimensional model

Regardless of the industry, we strongly encourage the four-step process for tackling dimensional model designs.

Step 1: Select Business Process

In this case study, we want to understand better customer purchases as captured by POS system. Hence, the business process we are modeling is POS retail sales transactions.

This data enables the business users to analyze which products are selling in which stores in which states on which days under what promotional conditions in which transactions.

Step 2: Declare the Grain

Tackling data at its lowest atomic grain makes sense for many reasons. Atomic
data is highly dimensional. The more detailed and atomic the fact measurement, the more things you know for sure. All those things you know for sure translate into dimensions. In this regard, atomic data is a perfect match for the dimensional approach.
Although aggregated data plays an important role for performance
tuning, it is not a substitute for giving users access to the lowest level details; users
can easily summarize atomic data, but it’s impossible to create details from summary data.

The grain here is an individual product purchased on a POS transaction.

Step 3: Identify the Dimensions

A careful grain statement determines the primary dimensionality of the
fact table. You then add more dimensions to the fact table if these additional dimensions naturally take on only one value under each combination of the primary dimensions. If the additional dimension violates the grain by causing additional fact rows to be generated, the dimension needs to be disqualiﬁed or the grain statement needs to be revisited.

After identifying the grain, the following dimensions include: product, store, state, date. promotion, cashier, and method of payment.

Step 4: Identify the Facts

Example of measured facts in retail sales schema (page 77)

In this step, we need to determine which should be put into fact table. No need to include non-additive facts.

In this case study, the gross proﬁt calculation is straightforward, but storing it means it’s computed consistently in the ETL process, eliminating the possibility of user calculation errors.

Trick: Assuming that gross revenues are $4 billion per year and that the average price of an item on a customer ticket is $2.00, you can calculate that there are approximately 2 billion transaction line items per year.

Design dimensional model

Date Dimension

Even 20 years’ worth of days is only approximately 7,300 rows, which is a relatively small dimension table.

Should avoid using cryptic such as Y/N, 1/0 and instead use Holiday and Non-holiday.

Product Dimension

The product dimension is almost always sourced from the operational product master ﬁle. Most retailers administer their product master ﬁle at headquarters and download a subset to each store’s POS system at frequent intervals. It is headquarters’ responsibility to deﬁne the appropriate product master record (and unique SKU number) for each new product.
The merchandise hierarchy is an important group of attributes. Typically, individual SKUs roll up to brands, brands roll up to categories, and categories roll up to departments. Each of these is a many-to-one relationship.

Example of product dimension sample rows (page 84)

Data involved in calculations should be in fact tables and data involved in constraints, groups and labels should be in dimension tables

Rule of thumbs:

A reasonable product dimension table can have 50 or more descriptive attributes.
Each attribute is a rich source for constraining and constructing row header labels.

Store dimension

Promotion Dimension

You must avoid null keys in the fact table. A proper design includes
a row in the corresponding dimension table to identify that the dimension is not applicable to the measurement.