February 9, 2020
Olusola Palacios
Module 1- Data Warehouse Design
Module
1 covers several topics under Data Warehouse Design. Data warehouse is quite
different from standard operational database and technology that collects data
from one or more sources for comparison and analysis which are critical for
informed business decision making and allowing transactional databases to
process transactions. Module 1 also covers basic concepts in developing Data
Warehouse using Star Schema.
Developing
Data Warehouse is important in designing a warehouse, Star Schema caught my
attention for its simplicity. Data Warehouse Star Schema is a simple and
popular data warehouse design and dimensional model. It’s designed by dividing
data into facts and dimensions. It is an OLAP (Online Analytical Processing) a system that can store aggregated, historical data and stored in
multi-dimensional schemas. In creating a Star Schema, it is important to note
what information the dimension and fact tables hold.
Dimensions
& Facts
Dimensions: These are tables that
contain columns and attributes that are used to describe business processes.
Dimension tables have unique primary key columns that are unique and basically
used to associate with the fact table.
Other few things to consider:
- Granularity: Dimension table carries its own grain or
granularity. This is the lowest level of information or detail in the table.
- Non-Key Elements: Non-Key elements appear in dimension
tables.
- Time and Date: Multiple time and date dimensions may
exist.
- Creating one to many relationships: The rows in
dimension table creates a one to many relationship with the fact
table.
- Records: The number of records in the dimensions table
are usually smaller than the number of facts.
- Dimensions are usually the actors or attributes related
to them.
- Dimensions are usually denormalized.
- Not located in the center of the schema.
Facts: These are measurable data of
specific events and are numeric in value. The fact table carries the foreign
keys to dimensional data and other measurable data.
- Fact tables are located at the center of a star
schema
- Fact tables are often denormalized
- Fact tables contain two columns, the foreign keys column, and the measure columns.
Why is it called a Star Schema?
It is called a star schema because the tables are situated similarly to a star as represented in Fig. 1.1. The tables are also organized in a way that allows the joining of dimension tables.
Fig. 1.1
- Simplicity
– Easy to read, use and understand.
- Performance
– Queries run faster since schemas have small tables and clear paths.
- Scalability
– Schemas are extensible to adjust to changes such as adding dimensions,
attributes, and changes.
Star Schemas have their disadvantages too. It does not
enforce many to many relationships and it is highly denormalized which may
affect data integrity.
References:
Geeksforgeeks (n.d). Star Schema in Data Warehouse modeling. Retrieved from https://www.geeksforgeeks.org/star-schema-in-data-warehouse-modeling/
Microsoft (2019). Understand star schema and the importance for Power BI. Retrieved from https://docs.microsoft.com/en-us/power-bi/guidance/star-schema
Informatica (n.d). What is Data Warehousing. Retrieved from https://www.informatica.com/services-and-training/glossary-of-terms/data-warehousing-definition.html
S. Vithal (2019). Data Warehouse Star Schema Model and Design. Retrieved from https://dwgeek.com/star-schema-model-data-darehouse.html/
No comments:
Post a Comment