Member-only story

Data warehouse (part 3): ETL or ELT? A touch of practice as well

5 min readJun 1, 2022

As briefly mentioned in part 1 and part 2, we are all probably familiar with the term “ETL”. In this part, let’s dive deeper into ETL and ELT with concepts and a bit of ETL practice in Visual Studio 2019.

As shown in part 1, the nature of traditional RDBMS/cubes leads to a middle section between data source systems/applications and data warehouse, named ETL. Meaning that the raw data from the source should be extracted, transformed then loaded to the data warehouse so that inside datawarehouse, we do not transform any data at all. However with big data, it’s unoptimal to do ETL to the data lake because of data lake’s nature. Due to this, instead of ETL, we have ELT (Extract, Load and Transform), meaning that data is first extracted from the source, loaded into the data lake/data warehouse-lake hybrid and then got transformed then be ready for analysis.

Challenges with traditional ETL:

Significant business analysis before storing data
Significant data modelling before storing data

Two variations of ETL

Initial load

Normally one time-only
Right before the data warehouse goes live
all relevant data necessary for BI and analytics (not all possible source data)
redo if data warehouse “blows up”

Data warehouse (part 3): ETL or ELT? A touch of practice as well

Two variations of ETL

Initial load

Increment

Written by Hang Nguyen

No responses yet