The Schema's - Datawarehousing tables arrangement

The Schema's:

The concept of Schema talks about how various Dimension and Fact tables can be arranged to create a Datawarehouse.

In designing data models for data warehouses / data marts, the most commonly used schema types are Star Schema, Snowflake Schema and  Fact Constellation Schema. Apart from these there is something called as Hybrid Schema.

Let's talk about each type:

Star Schema: The star schema architecture is the simplest data warehouse schema. It is called a star schema because the diagram resembles a star, with Fact table at the center with radiating Dimension tables. Each dimension is represented as a single table. The primary key in each dimension table is related to a foreign key in the fact table.
 A star schema can be simple or complex. A simple star consists of one fact table; a complex star can have more than one fact table.

Below diagram, represents a Star Schema.


                                                                                Fig, 1(a)


Fig, 1(b)

In the figure above, Sales is the Fact table and others are Dimension table. Also notice the primary and Foreign key relationships.

Snowflake schema: The snowflake schema resembles a star schema, with the modification that the dimensional tables can normalized into multiple tables, each representing a level in the dimensional hierarchy. In a snowflake schema, some or all of the Dimension tables can be normalized. This is called as Snowflaking.
Snowflake schema is used to improve the performance of certain queries. .
A star schema stores all attributes for a dimension into one (denormalized) table, which indirectly requires more space as compared to the normalized snowflake schema.  Snowflake schema normalizes the particular dimension table by moving some of the attributes with low cardinality (few distinct values) into separate dimension tables. This normalized table id then connected to the Dimension table by foreign key. Snowflake schema should not be used only with the thought of reducing the disk space, because it can adversely impact query performance in certain cases.
When can a snowflake schema be useful:
  • Tables which will be rarely used, where most dimension member records have a NULL value for the attribute, can be moved to a sub-dimension.
  • Attributes of a specific hierarchy and which are queried independently. Like: Date - Year, month, Day and Location - Country and State.
  • The normalization of dimension tables directly increases the number of dimension tables or sub-dimension tables present in the Datawarehouse which again require more primary key foreign key relationship, increasing the complexity of the datawarehouse. Also reduce the query performance because of more number of tables. 
  • The query of snowflake schema is more complex than query of star schema due to multiple joins from dimension table to sub-dimension tables.
  • Snowflake schema helps in saving space by normalizing dimension tables.
  • By creating aggregate table(s) and joining it (them) to the required dimension table(s) improves performance by reducing the execution time in some cases.
Below diagram represents the Snowflake Schema:


Fig, 2(a)

Fact Constellation Schema: Fact Constellation schema also called as Galaxy Schema, is mainly identified by the presence of multiple Fact tables and some of the the Dimension tables are shared across the Fact Tables.

Fig, 3(a)
Usually Fact tables are separated or multiple Fact tables are created when we require the aggregate over few Fact or Dimension.
As you can see in the above Diagram, Fig, 3(a), we have 2 Fact Tables and multiple Dimension table. In this scenario we have 2 Fact table, because one Fact is used to store the calculated(aggregate) value at Sales level and other at Shipping level. Also as can be easily notices location and item Dimension tables are shared across both Fact tables.
Another example of Fact Constellation below:
Fig, 3(b)

Hybrid Schema: Hybrid schema is built by utilizing or combining the 3 different type of schema described above. Datawarehousing is a very complex science, which usually require a company to utilize the advantages and remove the disadvantages of other schema which gives birth to Hybrid Schema. 

No comments:

Post a Comment