- Dimension table
-
In data warehousing, a dimension table is one of the set of companion tables to a fact table.
The fact table contains business facts or measures and foreign keys which refer to candidate keys (normally primary keys) in the dimension tables.
Contrary to fact tables, the dimension tables contain descriptive attributes (or fields) which are typically textual fields or discrete numbers that behave like text. These attributes are designed to serve two critical purposes: query constraining/filtering and query result set labeling.
Dimension attributes are supposed to be:
- Verbose - labels consisting of full words,
- Descriptive,
- Complete - no missing values,
- Discretely valued - only one value per row in dimensional table,
- Quality assured - no misspelling, no impossible values.
Dimension table rows are uniquely identified by a single key field. It is recommended that the key field is a simple integer for the reason that key value is meaningless and is only used to be join fields between the fact and dimension tables.
The usage of surrogate dimension keys brings several advantages among:
- Performance - join processing is much more efficient if a single field surrogate key is used,
- Buffer from operational key management practices - prevents form situations when removed data rows might reappear when their natural keys might be reused or reassigned after a long period of dormancy,
- Mapping to integrate disparate sources,
- Handle unknown or not applicable connections,
- Track changes in dimension attribute values.
Usage of surrogate keys also brings an additional costs due the burden put on the ETL system. Still, pipeline processing can be improved and ETL tools have built-in improved surrogate key processing.
The goal of dimension table is to create standardized conformed dimensions which can be shared across the enterprise's data warehouse environment and joining to multiple fact table representing various business processes.
Conformed dimensions are highly important to enterprise nature of DW/BI system for following reasons:
- Consistency - every fact table is filtered consistently, result query answer are labeled consistently,
- Integration - queries are able to drill different processes fact tables separately for each individual fact table and then join the results on common dimension attributes,
- Reduced development time to market - the common dimensions are available without recreating the wheel over again.
Over time, the attributes of a given row in a dimension table may change. For example, the shipping address for a company may change. Kimball refers to this phenomenon as Slowly Changing Dimensions. Strategies for dealing with this kind of change are divided into three categories:
- Type One - Simply overwrite the old value(s).
- Type Two - Add a new row containing the new value(s), and distinguish between the rows using Tuple-versioning techniques.
- Type Three - Add a new attribute to the existing row.
See also
References
- Kimball, Ralph. The Data Warehouse Lifecycle Toolkit Scond Edition. Winely Publishing Inc., 2008, p.241-246.
- Kimball, Ralph et al. (1998); The Data Warehouse Lifecycle Toolkit, p17. Pub. Wiley. ISBN 0-471-25547-5.
- Kimball, Ralph (1996); The Data Warehouse Toolkit, p100. Pub. Wiley. ISBN 0-471-15337-0.
Data warehouse Creating the data warehouseConcepts- Database
- Dimension
- Dimensional modeling
- Fact
- OLAP
- Star schema
- Aggregate
Variants- Anchor Modeling
- Column-oriented DBMS
- Data Vault Modeling
- HOLAP
- MOLAP
- ROLAP
- Operational data store
Elements- Data dictionary/Metadata
- Data mart
- Sixth normal form
- Surrogate key
FactDimension- Dimension table
- Degenerate
- Slowly changing
FillingUsing the data warehouseConceptsLanguagesTools- Business intelligence tools
- Reporting software
- Spreadsheet
RelatedPeopleProducts- Comparison of OLAP Servers
- Data warehousing products and their producers
Categories:- Data warehousing
Wikimedia Foundation. 2010.