Sunday, February 9, 2020

Data Warehouse Design - Star Schema


February 9, 2020
Olusola Palacios
Module 1- Data Warehouse Design

Module 1 covers several topics under Data Warehouse Design. Data warehouse is quite different from standard operational database and technology that collects data from one or more sources for comparison and analysis which are critical for informed business decision making and allowing transactional databases to process transactions. Module 1 also covers basic concepts in developing Data Warehouse using Star Schema.

Developing Data Warehouse is important in designing a warehouse, Star Schema caught my attention for its simplicity. Data Warehouse Star Schema is a simple and popular data warehouse design and dimensional model. It’s designed by dividing data into facts and dimensions. It is an OLAP (Online Analytical Processing) a system that can store aggregated, historical data and stored in multi-dimensional schemas. In creating a Star Schema, it is important to note what information the dimension and fact tables hold.

Dimensions & Facts

Dimensions: These are tables that contain columns and attributes that are used to describe business processes. Dimension tables have unique primary key columns that are unique and basically used to associate with the fact table.  

Other few things to consider:

  • Granularity: Dimension table carries its own grain or granularity. This is the lowest level of information or detail in the table.  
  • Non-Key Elements: Non-Key elements appear in dimension tables.
  • Time and Date: Multiple time and date dimensions may exist.
  • Creating one to many relationships: The rows in dimension table creates a one to many relationship with the fact table.  
  • Records: The number of records in the dimensions table are usually smaller than the number of facts.
  • Dimensions are usually the actors or attributes related to them.
  • Dimensions are usually denormalized.
  • Not located in the center of the schema.

Facts: These are measurable data of specific events and are numeric in value. The fact table carries the foreign keys to dimensional data and other measurable data. 

  • Fact tables are located at the center of a star schema 
  • Fact tables are often denormalized
  • Fact tables contain two columns, the foreign keys column, and the measure columns.

Why is it called a Star Schema?

It is called a star schema because the tables are situated similarly to a star as represented in Fig. 1.1. The tables are also organized in a way that allows the joining of dimension tables. 

Fig. 1.1

 Advantages of Star Schema

  • Simplicity – Easy to read, use and understand.
  • Performance – Queries run faster since schemas have small tables and clear paths.
  • Scalability – Schemas are extensible to adjust to changes such as adding dimensions, attributes, and changes. 
Star Schemas have their disadvantages too. It does not enforce many to many relationships and it is highly denormalized which may affect data integrity.

References: 

Geeksforgeeks (n.d). Star Schema in Data Warehouse modeling. Retrieved from https://www.geeksforgeeks.org/star-schema-in-data-warehouse-modeling/ 

Microsoft (2019). Understand star schema and the importance for Power BI. Retrieved from https://docs.microsoft.com/en-us/power-bi/guidance/star-schema

Informatica (n.d). What is Data Warehousing. Retrieved from https://www.informatica.com/services-and-training/glossary-of-terms/data-warehousing-definition.html

S. Vithal (2019). Data Warehouse Star Schema Model and Design. Retrieved from https://dwgeek.com/star-schema-model-data-darehouse.html/




No comments:

Post a Comment

Final Blog – Reflection of Spring 2020 MIS 587

MIS 587, in my opinion, has several components to it such as statistics, analytics, business intelligence and processes, and economics,...