Star schema - WikiHQ

thumb|right|

In computing, the star schema or star model is the simplest style of data mart schema and is the approach most widely used to develop data warehouses and dimensional data marts. The star schema consists of one or more fact tables referencing any number of dimension tables. The star schema is an important special case of the snowflake schema, and is more effective for handling simpler queries.

The star schema gets its name from the physical model's resemblance to a star shape with a fact table at its center and the dimension tables surrounding it representing the star's points.

Model

The star schema separates business process data into facts, which hold the measurable, quantitative data about a business, and dimensions which are descriptive attributes related to fact data. Examples of fact data include sales price, sale quantity, time, distance, speed and weight measurements. Related dimension attribute examples include product models, product colors, product sizes, geographic locations, and salesperson names.

A star schema that has many dimensions is sometimes called a centipede schema. Having dimensions of only a few attributes, while simpler to maintain, results in queries with many table joins and makes the star schema less easy to use.

Fact tables

Fact tables record measurements or metrics for a specific event.

Fact tables generally consist of numeric values, and foreign keys to dimensional data where descriptive information is kept.

Star schemas align well with multidimensional/OLAP models commonly used in decision support.

Typical tables

Dimensional modeling distinguishes a central fact table and surrounding dimension tables. Common fact table types in star schemas are transaction, periodic snapshot, and accumulating snapshot; frequently used conformed dimensions include date/time, product, customer, organization, and geography.

Query performance considerations

Analytic queries over a star schema usually join one large fact table with a handful of relatively small dimensions; many DBMSs implement ‘‘star-join’’ optimizations for this pattern. Performance characteristics of such workloads are commonly studied using the Star Schema Benchmark (SSB).

Example

300px|thumb|right|Star schema used by example query

Consider a database of sales, perhaps from a store chain, classified by date, store and product. The image of the schema to the right is a star schema version of the sample schema provided in the snowflake schema article.

<code>Fact_Sales</code> is the fact table and there are three dimension tables <code>Dim_Date</code>, <code>Dim_Store</code> and <code>Dim_Product</code>.

Each dimension table has a primary key on its <code>Id</code> column, relating to one of the columns (viewed as rows in the example schema) of the <code>Fact_Sales</code> table's three-column (compound) primary key (<code>Date_Id</code>, <code>Store_Id</code>, <code>Product_Id</code>). The non-primary key <code>Units_Sold</code> column of the fact table in this example represents a measure or metric that can be used in calculations and analysis. The non-primary key columns of the dimension tables represent additional attributes of the dimensions (such as the <code>Year</code> of the <code>Dim_Date</code> dimension).

For example, the following query answers how many TV sets have been sold, for each brand and country, in 1997:

SELECT

P.Brand,

S.Country AS Countries,

SUM(F.Units_Sold)

FROM Fact_Sales F

INNER JOIN Dim_Date D ON (F.Date_Id = D.Id)

INNER JOIN Dim_Store S ON (F.Store_Id = S.Id)

INNER JOIN Dim_Product P ON (F.Product_Id = P.Id)

WHERE D.Year = 1997 AND P.Product_Category = 'tv'

GROUP BY

P.Brand,

S.Country

</syntaxhighlight>

References

External links

Stars: A Pattern Language for Query Optimized Schema