Member-only story

Data Warehouse (part 1): The Basics

Hang Nguyen
2 min readMay 30, 2022

--

This term “Data Warehouse” is sometimes mistaken with “Database”, while the fact that data warehouse is built on top of databases. Meaning that data warehouse is the platform, while database is the application.

Data warehouse is needed because:

  • making data-driven decisions: past, present, future, the unknown data are combined to support decision-making process.
  • follows one-stop shopping principle: 1 place contains all necessary goods (in this case, data).

Rules of designing a data warehouse (Bill Inmon, 1990):

  • Data warehouse is an integrated environment. Meaning that data from different sources are sent into the data warehouse.
  • Subjected oriented: Meaning that regardless of systems and data from any system, we should organize data based on subjects.
  • Time variant: Contains also historical data, not just current data.
  • Non volatile: Even if a crush happened in the system, data did not disappear.

Data lake

Datawarehouse is also often mistaken with data lake. It is indeed different from data warehouse in terms of 3 Vs :

  • Velocity: datawarehouse is faster to retrieve data, since data in data lake are mainly raw data, while data in data warehouse is basically cleansed and ready for analysis phase.
  • Variety: contains all kind of data formats.

--

--

Hang Nguyen
Hang Nguyen

Written by Hang Nguyen

Just sharing (data) knowledge

No responses yet