Online Business Intelligence Spring 2020: February 2020

Saturday, February 22, 2020

Module 3 - Network Visualization and Social Network Analysis.

Olusola Palacios

February 22, 2020

In module 3, we covered Network Visualization and Social Network Analysis. To understand networks, which are also known as graphs, we need to first understand the concept of visualization. What exactly is visualization? In module 3, we understand that “the word visualization means “to form a mental image of a concept, idea or object.”” The purpose of Network Visualization is to explore, communicate and understand.

A network is a collection of nodes or vertices, connected by edges or links.

Source: Analytic Vidhya

The diagram above shows nodes and lines. The circular points represent individual nodes in a network and each line represents an edge or a link.

In social networking, people can be represented by nodes and networks as links or edges. In social networking, a person can be linked to another if they are friends on Facebook for example or co-workers or colleagues on Linkedin. Analyzing social media networks is known as Social Network Analysis (SNA). The concept of SNA can also be applied to other domains that aren't social media. For example, in healthcare, patients can be linked to their service providers and in doing so, patterns of healthcare provider behavior can be drawn. The same concept can be applied in other industries such as financial industries.

Directed and Undirected Networks

Directed and undirected networks can be explained by analyzing connections in social media. A friendship in social media establishes a link; however, it has no direction. For example, person A may be friends with Person B, there is a link, but without direction. If Person A directly connects by liking or chatting with Person B, then they have established a link with direction. This link, however, can be both directional and unidirectional as Person A may communicate with Person B, Person B does not have to respond. Due to such reason, the directionality of a network is determined by links, not nodes.

Network Visualization Networks

There are different kinds of network visualization layout, it is important to understand the following:

Networks can be visualized in many different ways.
Network diagram is drawn by connecting nodes and edges.
Network visualizations help communicate ideas about connectivity.

In order to determine or measure what or who is important or central to a network, the key measurements include betweenness, eigenvector centrality, degree, closeness, reciprocity, and influence.

Different Types Network Visualization Layout

1. Force-directed layout – minimizes node collision

2. Clustering layout – identifies heavily connected parts

3. Circular layouts – layout appears in circular forms

4. Geographic layout – maps nodes to different locations

5. Hierarchical layout - identifies the relationship between nodes

Benefits of Visualization

1. Spend less time to integrate information

2. Better insights and understanding of date

3. Effective communication

4. Better understanding

Data Visualization is used in different industries such as Anti-Financial Crime, Cybersecurity, Intelligence, IT Operations Management, Enterprise Architecture, and Life Science. Network Analysis and Data Visualization are problem-solving tools for small data and large data collections.

Reference:

Himelboim, I (2007). Social Network Analysis (Social Media). Retrieved from https://onlinelibrary.wiley.com/doi/full/10.1002/9781118901731.iecrm0236

Linkurios (2017). Graph Visualization: Why it Matters. Retrieved from https://linkurio.us/blog/why-graph-visualization-matters/

Zoss, A., Maltese A, Uzzo, S., & Borner, K., (n.d). 5 Network Visualization Literacy. Retrieved from https://cns.iu.edu/docs/publications/2018-NetSci-Zoss.pdf

Sunday, February 16, 2020

Module 2 – Web Analysis (Web Metrics and Google Analytics)

Olusola Palacios,

February 16, 2020

Web Analytics is the "measurement, collection, analysis and reporting of internet data for the purpose of understanding and optimizing web usage”. Web Analytics measures visitor activity, analysis data to enlighten behavioral patterns of users, as well as a tool that encourages business and market research. Below is an illustration of Web Analysis, it’s all about metrics – measure, optimize, report and analyze.

Web Analytics consists of 5Ws, namely:

1. Who (location, demographics)

2. When (page views, events, time & date / seasonality)

3. Where (location, network)

4. Why (events, clickpath, pages viewed)

5. What (device type, screen size, network)

What are some important metrics/Key Performance Indicators (KPI) to track?

1. Website Traffic: Traffic is important and fundamental for a website to be successful and it is easily tracked by using Analytic tools such as Google Analytics. Traffics helps to determine if a website and in fact a business is growing, a flat and steady decline of lack of traffic will indicate that marketing or business isn’t doing well. Using Google Analytics, one can determine if the traffic is from new visitors, repeat visitors, where and when the traffic is higher or lower. This helps determine where a business should target for continuous growth.

2. Traffic Sources: There are different kinds of web traffics sources: (1) direct traffic, (2) Organic (3) Referral (4) Campaign. All of these are covered in Module 2. It is important to know where the traffic is being generated, for example – organic searches will help determine if a website if ranked high in search engines. Whatever source referrals come from will help determine how well a business should drive more traffic and from what source.

3. Bounce Rate: Bounce rate was explained in Module 2, it is important in that it tells us the number of visitors who leave the website from a particular page. This is an important metric as it may determine how well the website is doing. The lower the bounce rate, means more people are visiting and staying on the website to accomplish the end goal, whether it is to purchase or accomplish other tasks.

4. Conversion Rate: This is the “proportion of visits that result in goal achievement”. It is highly important to track the conversion rate, which is calculated as unique visitors/conversions. Conversion rates can have a substantial impact on profits, the website should continually be improved form conversions.

There are so many other Web Analytic Metrics such as number of visits, exit rates, number of visits, pages visited, top pages, etc. All of these can be measured easily using Google Analytics.

Why are metrics/KPIs important?

1. KPI allows users to extract meaning from data at a glance.

2. KPI allows users to create a snapshot to monitor performance over time.

3. KPI and metrics provide overall health of marketing of a business.

4. KPI highlights potential problems and may help users understand a better solution.

Importance of Data Analytics:

1. Historical and Real-Time monitoring of users.

2. Easier marketing.

3. Identifying pages with specific bounce rates.

4. Determining future demands.

Conclusion

It is important to have monitor metrics and KPIs of a business website, to create better marketing strategies and for business growth.

References:

Quarton, S. (2015). 7 Key Metrics to track the Success of Your Website. Torque. Retrieved from https://torquemag.io/2015/03/7-key-website-metrics-track/

Ram, S. (n.d). Introduction to Web Analytics. Class Notes. Lecture 9_V18, University of Arizona.

Ryan, D (2014). Understanding Web Analytics and Key Performance Indicators. Kogan Page. Retrieved from https://www.koganpage.com/article/understanding-web-analytics-and-key-performance-indicators.

Sharma, V. (2017). Importance of web analytics. Klient Solutech. http://www.klientsolutech.com/importance-of-web-analytics/

Sunday, February 9, 2020

Data Warehouse Design - Star Schema

February 9, 2020

Olusola Palacios

Module 1- Data Warehouse Design

Module 1 covers several topics under Data Warehouse Design. Data warehouse is quite different from standard operational database and technology that collects data from one or more sources for comparison and analysis which are critical for informed business decision making and allowing transactional databases to process transactions. Module 1 also covers basic concepts in developing Data Warehouse using Star Schema.

Developing Data Warehouse is important in designing a warehouse, Star Schema caught my attention for its simplicity. Data Warehouse Star Schema is a simple and popular data warehouse design and dimensional model. It’s designed by dividing data into facts and dimensions. It is an OLAP (Online Analytical Processing) a system that can store aggregated, historical data and stored in multi-dimensional schemas. In creating a Star Schema, it is important to note what information the dimension and fact tables hold.

Dimensions & Facts

Dimensions: These are tables that contain columns and attributes that are used to describe business processes. Dimension tables have unique primary key columns that are unique and basically used to associate with the fact table.

Other few things to consider:

Granularity: Dimension table carries its own grain or granularity. This is the lowest level of information or detail in the table.
Non-Key Elements: Non-Key elements appear in dimension tables.
Time and Date: Multiple time and date dimensions may exist.
Creating one to many relationships: The rows in dimension table creates a one to many relationship with the fact table.
Records: The number of records in the dimensions table are usually smaller than the number of facts.
Dimensions are usually the actors or attributes related to them.
Dimensions are usually denormalized.
Not located in the center of the schema.

Facts: These are measurable data of specific events and are numeric in value. The fact table carries the foreign keys to dimensional data and other measurable data.

Fact tables are located at the center of a star schema
Fact tables are often denormalized
Fact tables contain two columns, the foreign keys column, and the measure columns.

Why is it called a Star Schema?

It is called a star schema because the tables are situated similarly to a star as represented in Fig. 1.1. The tables are also organized in a way that allows the joining of dimension tables.

Fig. 1.1

Advantages of Star Schema

Simplicity – Easy to read, use and understand.
Performance – Queries run faster since schemas have small tables and clear paths.
Scalability – Schemas are extensible to adjust to changes such as adding dimensions, attributes, and changes.

Star Schemas have their disadvantages too. It does not enforce many to many relationships and it is highly denormalized which may affect data integrity.

References:

Geeksforgeeks (n.d). Star Schema in Data Warehouse modeling. Retrieved from https://www.geeksforgeeks.org/star-schema-in-data-warehouse-modeling/

Microsoft (2019). Understand star schema and the importance for Power BI. Retrieved from https://docs.microsoft.com/en-us/power-bi/guidance/star-schema

Informatica (n.d). What is Data Warehousing. Retrieved from https://www.informatica.com/services-and-training/glossary-of-terms/data-warehousing-definition.html

S. Vithal (2019). Data Warehouse Star Schema Model and Design. Retrieved from https://dwgeek.com/star-schema-model-data-darehouse.html/

Online Business Intelligence Spring 2020