Data warehouse (part 8): Data Warehouse and ETL in practice

Hang Nguyen
3 min readJun 14, 2022

Please refer to the data set here and its variables’ description in Kaggle.

ERD design for data warehouse

ERD design was conducted based on the given dataset above.

All dimension tables were first created, then the fact table was designed based on these dimension tables. The principle of creating dimension table is that all categorical variables that has several options, for example Gender variable that can be Male, Female, and Mixed will be split into separate dimension tables. The rest numerical variables such as Fee and Quantity will be loaded into fact table. There are 7 dimension tables in total, with 1 fact table following centralized data warehouse practice as follows.

  • Type_DIM: contains the pet type and their maturity size
  • State_DIM: contains state name
  • Person_DIM: contains ID of rescuers
  • Breed_DIM: contains information of pet breed
  • Color_DIM: contains information of pet color
  • Med_DIM: contains information of medical condition
  • Gender_DIM: contains information of pet gender
  • PetDesc_DIM: contains information of pet in general (name, description…)
  • Pet_FACT: contains every bit of information of pet, with references to other DIM tables.

--

--

Hang Nguyen
Hang Nguyen

Written by Hang Nguyen

A Data Engineer with a passion for technology, literature, and philosophy.