- Data warehouse architectures
-
The technical architecture of data warehouses is somewhat similar to other systems, but does have some special characteristics. There are two border areas in data warehouse architecture - the single-layer architecture and the N-layer architecture. The difference here is the number of middleware between the operational systems and the analytical tools. The data warehouse architecture described here is a high level architecture and the parts in the architectures mentioned are full bodied systems and not system-parts.
Contents
Single-layer architecture
A simple architecture is the single-layer architecture. There is no physical data warehouse or data mart between the operation data and the analytic tools. The middleware in this type of system should be considered a virtual data warehouse, which consists of a software layer and not a data based layer. The single-layer model is light weight as it minimises redundancies and thereby the amount of data stored. It has, however, no separation between analytical and operational processing. The analysis are based directly on the operational data[1].
Two-layer architecture
The two-layer model consists of operational (and external) data in the source layer and a data warehouse layer on top of these. Between the source layer and the data warehouse layer is an ETL system. The analytical part of this architecture bases its analysis on the loaded data in the data warehouse or possibly data marts. The redundancy of data means a more stable source of information as heavy load or failure in the operational systems have no effect on the analytical tools and vice versa. The data warehouse layer furthermore adds the possibility to structure data in a way that fits with the multidimensional model of analytical tools, which in turn make them faster. Such an architecture is, however, more resource consuming to build and maintain.
Three-layer architecture
The three-layer architecture consists of the source layer (containing multiple source systems), the reconciled layer and the data warehouse layer (containing both data warehouses and data marts). The reconciled layer sits between the source data and data warehouse. It is populated with data from the source systems through an ETL process and the data stored in it is published further through another ETL process. In the reconciled layer the data has been cleaned up once and integrated to a common standardised form from multiple different source systems. The ETL process that feeds the data warehouse then only gets already integrated data that has less need for transformation. This architecture is especially useful for the very large, enterprise-wide systems[1]. A disadvantage of this architecture is the extra data storage space used through the extra redundant reconciled layer. It also makes the analytical tools a little further away from being real-time.
References
Categories:- Data warehousing
- Business intelligence
Wikimedia Foundation. 2010.