Star vs. Snowflake Schema

Introduction

The star and snowflake schema are logical storage designs commonly found in data marts and data warehouse architecture. While common database types use ER (Entity-Relationship) diagrams, the logical structure of warehouses uses dimensional models to conceptualize the storage system.

Read on to find out the differences, characteristics, and flaws of the star and snowflake schemas.

Star vs. Snowflake Schema

Star Schema vs. Snowflake Schema: The Main Difference

The two main elements of the dimensional model of the star and snowflake schema are:

1. Facts table. A table with the most considerable amount of data, also known as a cube.

2. Dimension tables. The derived data structure provides answers to ad hoc queries or dimensions, often called lookup tables.

Connecting chosen dimensions on a facts table forms the schema. Both the star and snowflake schemas make use of the dimensionality of data to model the storage system.

The main differences between the two schemas are:

Star SchemaSnowflake Schema
ElementsFact table Dimension tablesFact table Dimension tables Subdimension tables
StructureStar-shapedSnowflake shaped
DimensionsOne table per dimensionMultiple tables for each dimension
Model DirectionTop-downBottom-up
Storage spaceUses more storageUses less space
NormalizationDenormalized dimension tablesNormalized dimension tables
Query PerformanceFast, fewer JOINs needed because of fewer foreign keysSlow, more JOINs required because of more foreign keys
Query ComplexitySimple and easier to understandComplicated and more challenging to understand
Data RedundancyHighLow
Use caseDimension tables with several rows, typical with data martsDimension tables with multiple rows found with data warehouses

Due to the complexity of the snowflake schema and the lower performances, the star schema is the preferred option whenever possible. One typical way to get around the problems in the snowflake schema is to decompose the dedicated storage into multiple smaller entities with a star schema.

What Is a Star Schema?

A star schema is a logical structure for the development of data marts and simpler data warehouses. The simple model consists of dimension tables connected to a facts table in the center.

Star Schema explained with diagram

The facts table typically consists of:

  • Quantifiable numerical data, such as values or counts.
  • References to the dimensions through foreign keys.

The lookup tables represent descriptive information directly connected to the facts table.

For example, to model the sales of an ecommerce business, the facts table for purchases might contain the total price of the purchase. On the other hand, dimensional tables have descriptive information about the items, customer data, the time or location of purchase.

Example of an ecommerce structure using star schema

The star schema for the analysis of purchases in the example has four dimensions. The facts table connects to the dimensional tables through the concept of foreign and primary keys. Apart from the numerical data, the facts table therefore also consists of foreign keys to define relations between tables.

Characteristics of a Star Schema

The main characteristics of the star schema are:

  • Simplified and fast queries. Fewer JOIN operations due to denormalization make information more readily available.
  • Simple relationships. The schema works great with one-to-one or one-to-many relationships.
  • Singular dimensionality. One table describes each dimension.
  • OLAP friendly. OLAP systems widely use star schema to design data cubes.

Drawbacks of a Star Schema

The disadvantages of using the star schema are:

  • Redundancy. The dimensional tables are one-dimensional, and data redundancy is present.
  • Low integrity. Due to denormalization, updating information is a complex task.
  • Limited queries. The set of questions is limited, which also narrows down the analytical power.

What Is a Snowflake Schema?

The snowflake schema has a branched-out logical structure used in large data warehouses. From the center to the edges, entity information goes from general to more specific.

Apart from the dimensional model's common elements, the snowflake schema further decomposes dimensional tables into subdimensions.

Snowflake schema explained with diagram

The ecommerce sales analysis model from the previous example further branches ("snowflakes") into smaller categories and subcategories of interest.

Example of an ecommerce structure using snowflake schema

The four dimensions decompose into subdimensions. The lookup tables further normalize through a series of connected objects.

Characteristics of a Snowflake Schema

The main features of the snowflake schema include:

  • Small storage. The snowflake schema does not require as much storage space.
  • High granularity. Dividing tables into subdimensions allows analysis at various depths of interest. Adding new subdimensions is a simple process as well.
  • Integrity. Due to normalization, the schema has a higher level of data integrity and low redundancies.

Drawbacks of a Snowflake Schema

The weaknesses of the snowflake schema are:

  • Complexity. The database model is complex, and so are the executed queries. Multiple multidimensional tables make the design complicated to work with overall.
  • Slow processing. Many lookup tables require multiple JOIN operations, which slows down information retrieval.
  • Hard to maintain. A high level of granularity makes the schema hard to manage and maintain.

Conclusion

The star and snowflake schema are both logical designs of database storage systems. Both schemas show up in big data systems, where the amount of information is massive and hard to maintain.

For more information about big data platforms, check out our article about the Hadoop platform.

Was this article helpful?
YesNo
Milica Dancuk
Milica Dancuk is a technical writer at phoenixNAP who is passionate about programming. Her background in Electrical Engineering and Computing combined with her teaching experience give her the ability to easily explain complex technical concepts through her content.
Next you should read
Hadoop vs Spark – Detailed Comparison
June 4, 2020

This article explains how Hadoop and Spark are different in multiple categories. The tools are both free, but...
Read more
Apache Hadoop Architecture Explained (with Diagrams)
May 25, 2020

Apache Hadoop is the go-to framework for storing and processing big data. This article...
Read more
OLTP vs. OLAP: A Comprehensive Comparison
June 8, 2021

With massive amounts of data, different processing techniques are used depending on whether you...
Read more
What Is NoSQL Database? – NoSQL Explained
June 17, 2020

The article provides a detailed explanation of what a NoSQL databases is and how it differs from...
Read more