[BOOK REVIEW] The Data Warehouse Toolkit by Ralph Kimball/Kimball’s DW/BI Architecture
There are 4 components to consider in DW/BI environment:
Operational Source Systems
A.k.a where to get data, where all transactions are operated.
Extract, Transformation, and Load System
Extract: Crawling raw data
Transformation: Dealing with raw data including cleansing data (correcting misspellings, resolving domain conflicts, dealing with missing elements, or parsing into standard formats), combining data from multiple sources, and de-duplicating data. After this process, we have nice-looking, meaningful datasets ready to be used.
Load: physical structuring and loading of data into the presentation area’s target dimensional models.
It is acceptable to create a normalized database to support the ETL processes; however, this is not the end goal. The normalized structures must be off -limits to user queries because they defeat the twin goals of understandability and performance.
Data Presentation area
The DW/BI presentation area is where data is organized, stored, and made available for direct querying by users, report writers, and other analytical BI applications.
It has concluded that dimensional modeling is the most viable technique for delivering data to DW/BI users.
It is completely unacceptable to store only summary data in dimensional models while the atomic data is locked up in normalized models
Conclusion:
Data in the queryable presentation area of the DW/BI system must be dimensional, atomic (complemented by performance-enhancing aggregates), business process-centric, and adhere to the enterprise data warehouse bus architecture. The data must not be structured according to individual departments’ interpretation of the data.
Business Intelligance applications
By definition, all BI applications query the data in the DW/BI presentation area. Querying, obviously, is the whole point of using data for improved decision making.