Member-only story

Data warehouse (part 3): ETL or ELT? A touch of practice as well

Hang Nguyen
5 min readJun 1, 2022

--

As briefly mentioned in part 1 and part 2, we are all probably familiar with the term “ETL”. In this part, let’s dive deeper into ETL and ELT with concepts and a bit of ETL practice in Visual Studio 2019.

As shown in part 1, the nature of traditional RDBMS/cubes leads to a middle section between data source systems/applications and data warehouse, named ETL. Meaning that the raw data from the source should be extracted, transformed then loaded to the data warehouse so that inside datawarehouse, we do not transform any data at all. However with big data, it’s unoptimal to do ETL to the data lake because of data lake’s nature. Due to this, instead of ETL, we have ELT (Extract, Load and Transform), meaning that data is first extracted from the source, loaded into the data lake/data warehouse-lake hybrid and then got transformed then be ready for analysis.

Challenges with traditional ETL:

  • Significant business analysis before storing data
  • Significant data modelling before storing data

Two variations of ETL

Initial load

  • Normally one time-only
  • Right before the data warehouse goes live
  • all relevant data necessary for BI and analytics (not all possible source data)
  • redo if data warehouse “blows up”

Increment

--

--

Hang Nguyen
Hang Nguyen

Written by Hang Nguyen

Just sharing (data) knowledge

No responses yet