Dimensional modeling

Dimensional modeling

Dimensional modeling (DM) is the name of a set of techniques and concepts used in data warehouse design. It is considered to be different from entity-relationship modeling (ER). Dimensional Modeling does not necessarily involve a relational database. The same modeling approach, at the logical level, can be used for any physical form, such as multidimensional database or even flat files. According to data warehousing consultant Ralph Kimball,[1] DM is a design technique for databases intended to support end-user queries in a data warehouse. It is oriented around understandability and performance. According to him, although transaction-oriented ER is very useful for the transaction capture, it should be avoided for end-user delivery.

Dimensional modeling always uses the concepts of facts (measures), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts. For example, sales amount is a fact; timestamp, product, register#, store#, etc. are elements of dimensions. Dimensional models are built by business process area, e.g. store sales, inventory, claims, etc. Because the different business process areas share some but not all dimensions, efficiency in design, operation, and consistency, is achieved using conformed dimensions, i.e. using one copy of the shared dimension across subject areas. The term "conformed dimensions" was originated by Ralph Kimball.

Contents

Dimensional modeling process:[2][3]

The dimensional model is built on a star-like schema, with dimensions surrounding the fact table. To build the schema, the following design model is used:

  1. Choose the business process
  2. Declare the Grain
  3. Identify the dimensions
  4. Identify the Fact
CHOOSE THE BUSINESS PROCESS

The process of dimensional modeling builds on a 4-step design method that helps to ensure the usability of the dimensional model and the use of the data warehouse. The basics in the design build on the actual business process which the data warehouse should cover. Therefore the first step in the model is to describe the business process which the model builds on. This could for instance be a sales situation in a retail store. To describe the business process, one can choose to do this in plain text or use basic Business Process Modeling Notation (BPMN) or other design guides like the Unified Modeling Language (UML).

DECLARING THE GRAIN

After describing the Business Process, the next step in the design is to declare the grain of the model. The grain of the model is the exact description of what the dimensional model should be focusing on. This could for instance be “An individual line item on a customer slip from a retail store”. To clarify what the grain means, you should pick the central process and describe it with one sentence. Furthermore the grain (sentence) is what you are going to build your dimensions and fact table from. You might find it necessary to go back to this step to alter the grain due to new information gained on what your model is supposed to be able to deliver.

IDENTIFY THE DIMENSIONS

The third step in the design process is to define the dimensions of the model. The dimensions must be defined within the grain from the second step of the 4-step process. Dimensions are the foundation of the fact table, and is where the data for the fact table is collected. Typically dimensions are nouns like date, store, inventory etc. These dimensions are where all the data is stored. For example, the date dimension could contain data such as year, month and weekday.

IDENTIFY THE FACTS

After defining the dimensions, the next step in the process is to make keys for the fact table. This step is to identify the numeric facts that will populate each fact table row. This step is closely related to the business users of the system, since this is where they get access to data stored in the data warehouse. Therefore most of the fact table rows are numerical, additive figures such as quantity or cost per unit, etc.

Dimension Normalization

Dimensional normalization or snowflaking removes redundant attributes, which are known in the normal flatten de-normalized dimensions. Dimensions are strictly joined together in sub dimensions.

Snowflaking has an influence on the data structure that differs from many philosophies of data warehouses.[3] Single data (fact) table surrounded by multiple descriptive (dimension) tables

Developers often don't normalize dimensions due to several facts:[4]

  1. Normalization makes the data structure more complex
  2. Performance can be slower, due to the many joins between tables
  3. The space savings are minimal
  4. The use of bitmap indexes can't be done
  5. Query Performance, 3NF databases suffer from performance problems when aggregating or retrieving many dimensional values that may require analysis. If you are only going to do operational reports then you may be able to get by with 3NF because your operational user will be looking for very fine grain data.

There are some arguments on why normalization can be useful.[3] It can be an advantage when part of hierarchy is common to more than one dimension. For example, a geographic dimension may be reusable because both the customer and supplier dimensions use it.

Benefits of dimensional modeling [5]

Benefits of the dimensional modeling are following:

  • Understandability - Compared to the normalized model, the dimensional model is easier to understand and more intuitive. In dimensional models, information is grouped into coherent business categories or dimensions, making it easier to read and interpret. Simplicity also allows software to navigate databases efficiently. In normalized models, data is divided into many discrete entities and even a simple business process might result in dozens of tables joined together in a complex way.
  • Query performance - Dimensional models are more denormalized and optimized for data querying, while normalized models seek to eliminate data redundancies and are optimized for transaction loading and updating. The predictable framework of a dimensional model allows the database to make strong assumptions about the data that aid in performance. Each dimension is an equivalent entry point into the fact table, and this symmetrical structure allows effective handling of complex queries. Query optimization for star join databases is simple, predictable, and controllable.
  • Extensibility - Dimensional models are extensible and easily accommodate unexpected new data. Existing tables can be changed in place either by simply adding new data rows into the table or executing SQL alter table commands. No queries or other applications that sit on top of the Warehouse need to be reprogrammed to accommodate changes. Old queries and applications continue to run without yielding different results. But in normalized models each modification should be considered carefully, because of the complex dependencies between database tables.

Literature

References

  1. ^ Kimball 1997.
  2. ^ Ralph Kimball, Margy Ross, Warren Thornthwaite, and Joy Mundy (January 10, 2008). The Data Warehouse Lifecycle Toolkit : Expert Methods for Designing, Developing, and Deploying Data Warehouses (Second ed.). Wiley. ISBN 978-0-470-14977-5. 
  3. ^ a b c Matteo Golfarelli, Stefano Rizzi (May 26, 2009). Data Warehouse Design: Modern Principles and Methodologies. McGraw-Hill Osborne Media. ISBN 978-0-071-61039-1. 
  4. ^ Ralph Kimball, Margy Ross (April 26, 2002). The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second ed.). Wiley. ISBN 0-471-20024-7. 
  5. ^ Ralph Kimball, Margy Ross,Warren Thornthwaite, Joy Mundy, Bob Becker (January, 2008). The data warehouse lifecycle toolkit, Second Edition (Second ed.). Wiley. ISBN 978-0-470-14977-9. 

Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Dimensional Fact Model — Data Warehouses (DWs) are databases used by decision makers to analyze the status and the development of an organization. DWs are based on large amounts of data integrated from heterogeneous sources into multidimensional databases, and they are… …   Wikipedia

  • Dimensional analysis — In physics and all science, dimensional analysis is a tool to find or check relations among physical quantities by using their dimensions. The dimension of a physical quantity is the combination of the basic physical dimensions (usually mass,… …   Wikipedia

  • Modeling language — A modeling language is any artificial language that can be used to express information or knowledge or systems in a structure that is defined by a consistent set of rules. The rules are used for interpretation of the meaning of components in the… …   Wikipedia

  • modeling or modelling —    A sculpture technique in which a three dimensional form is manipulated in a soft material such as clay either modeling clay (Plasticine is a well known brand) or ceramic clay or wax. The term also refers to the effect of light on a three… …   Glossary of Art Terms

  • modeling — /mod l ing/, n. 1. the act, art, or profession of a person who models. 2. the process of producing sculptured form with some plastic material, as clay. 3. the technique of rendering the illusion of volume on a two dimensional surface by shading.… …   Universalium

  • modeling — noun a) The art of sculpting models from clay etc. to create a representation of something. b) The representation of depth in a two dimensional image …   Wiktionary

  • Data Vault Modeling — is a database modeling method that is designed to provide historical storage of data coming in from multiple operational systems. It is also a method of looking at historical data that, apart from the modeling aspect, deals with issues such as… …   Wikipedia

  • Solid modeling — The geometry in solid modeling is fully described in 3‑D space; objects can be viewed from any angle. Modeled and ray traced in Cobalt Solid modeling (or modelling) is a consistent set of principles for mathematical and computer modeling of three …   Wikipedia

  • Homology modeling — Homology modeling, also known as comparative modeling of protein refers to constructing an atomic resolution model of the target protein from its amino acid sequence and an experimental three dimensional structure of a related homologous protein… …   Wikipedia

  • 3D modeling — This article is about computer modeling within an artistic medium. For scientific usage, see Computer simulation. 3D computer graphics …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”