Member-only story
Data Warehouse (part 1): The Basics
2 min readMay 30, 2022
This term “Data Warehouse” is sometimes mistaken with “Database”, while the fact that data warehouse is built on top of databases. Meaning that data warehouse is the platform, while database is the application.
Data warehouse is needed because:
- making data-driven decisions: past, present, future, the unknown data are combined to support decision-making process.
- follows one-stop shopping principle: 1 place contains all necessary goods (in this case, data).
Rules of designing a data warehouse (Bill Inmon, 1990):
- Data warehouse is an integrated environment. Meaning that data from different sources are sent into the data warehouse.
- Subjected oriented: Meaning that regardless of systems and data from any system, we should organize data based on subjects.
- Time variant: Contains also historical data, not just current data.
- Non volatile: Even if a crush happened in the system, data did not disappear.
Data lake
Datawarehouse is also often mistaken with data lake. It is indeed different from data warehouse in terms of 3 Vs :
- Velocity: datawarehouse is faster to retrieve data, since data in data lake are mainly raw data, while data in data warehouse is basically cleansed and ready for analysis phase.
- Variety: contains all kind of data formats.