Pages

Men

rh

7/11/2013

Comparing Data Warehouse Design Methodologies for Microsoft SQL Server

Problem

In my last couple of tips, I talked about the importance of a Business Intelligence solution, why it is becoming priority for executives, what a typical Business Intelligence system architecture looks like, etc. In this tip, I going to talk in detail about how a data warehouse is different from operational data store and the different design methodologies for a data warehouse.

Solution

This tip is going to cover Data Warehouses (DW, sometime also called an Enterprise Data Warehouse or EDW), how it differs from Operational Data Store (ODS) and different Data Warehouse design methodologies.

Enterprise Data Warehouse (EDW or DW) Vs. Operational Data Store (ODS)

The purpose of the Data Warehouse in the overall Business Intelligence Architecture is to integrate corporate data from different heterogeneous data sources in order to facilitate historical and trend analysis reporting. It acts as a central repository and contains the "single version of truth" for the organization that has been carefully constructed from data stored in disparate internal and external operational databases\systems. For better performance, mostly data in data warehouse will be in de-normalized form which can be categorized in either star or snowflake schemas (more on this in the next tip).

The purpose of the Operation Data Store (ODS) is to integrate corporate data from different heterogeneous data sources in order to facilitate real time or near real time operational reporting. Often data in the ODS will be in structured similar to the source systems, although during integration it can involve data cleansing, de-duplication and can apply business rules to ensure data integrity. An ODS is mainly intended to integrate data quite frequently at the lowest granular level for operational reporting in a close to real time data integration scenario. Normally, an ODS will not be optimized for historical and trend analysis on huge set of data.
Let's summarize the differences between an ODS and DW:
  • An ODS is meant for operational reporting and supports current or near real-time reporting requirements whereas a DW is meant for historical and trend analysis reporting on a large volume of data
  • An ODS is targeted for low granular queries whereas a DW is used for complex queries against summary-level or on aggregated data
  • An ODS provides information for operational, tactical decisions about current or near real-time data acquisition whereas a DW delivers feedback for strategic decisions leading to overall system improvements
  • In an ODS the frequency of data load could be hourly or daily whereas in an DW the frequency of data loads could be daily, weekly, monthly or quarterly

Data Warehouse Design Methodologies

There are two different methodologies normally followed when designing a Data Warehouse solution and based on the requirements of your project you can choose which one suits your particular scenario. These methodologies are a result of research from Bill Inmon and Ralph Kimball.

Bill Inmon - Top-down Data Warehouse Design Approach

Bill Inmon is sometimes also referred to as the "father of data warehousing"; his design methodology is based on a top-down approach and defines data warehouse in these terms
  • Subject oriented - The data in a data warehouse is categorized on the basis of the subject area and hence it is “subject oriented”.
  • Integrated - Data gets integrated from different disparate data sources and hence universal naming conventions, measurements, classifications and so on used in the data warehouse.  The data warehouse provides an enterprise consolidated view of data and therefore it is designated as an integrated solution.
  • Non-volatile - Once the data is integrated\loaded into the data warehouse it can only be read. Users cannot make changes to the data and this practice makes the data non-volatile.
  • Time Variant - Finally data is stored for long periods of time quantified in years and has a date and timestamp and therefore it is described as “time variant”.
Bill Inmon saw a need to integrate data from different OLTP systems into a centralized repository (called a data warehouse) with a so called top-down approach. Bill Inmon envisions a data warehouse at center of the "Corporate Information Factory" (CIF), which provides a logical framework for delivering business intelligence (BI), business analytics and business management capabilities.

Enterprise Data Warehouse (EDW or DW) Vs. Operational Data Store (ODS)

This top-down design provides a highly consistent dimensional view of data across data marts as all data marts are loaded from the centralized repository (Data Warehouse). The top-down design has also proven to be flexible to support business changes as it looks at the organization as whole, not at each function or business process of the organization. Generating a new dimensional data marts against the data stored in the data warehouse is a relatively simple task. 

Though there are some challenges for the top-down approach, for example it represents a very large project with a very broad scope and hence the up-front cost for implementing a data warehouse using the top-down methodology is significant. Further, the duration of time from the start of project to the point that end users start experience initial benefits of the solution can be substantial. Also, the top-down methodology can be inflexible and unresponsive to changing departmental or business process needs (a concern for today's dynamically changing environment) during the implementation phase.

Ralph Kimball - Bottom-up Data Warehouse Design Approach

Ralph Kimball is a renowned author on the subject of data warehousing. His design methodology is called dimensional modeling or the Kimball methodology. This methodology focuses on a bottom-up approach, emphasizing the value of the data warehouse to the users as quickly as possible. In his vision, a data warehouse is the copy of the transactional data specifically structured for analytical querying and reporting in order to support the decision support system. 

As per his methodology, data marts are first created to provide reporting and analytical capabilities for specific business\functional processes and later on these data marts can eventually be unioned together to create a comprehensive enterprise data warehouse. The bottom-up approach focuses on each business process at one point of time so the return on investment could be as quick as first data mart gets created. Though if not carefully planned, you might lack the big picture of the enterprise data warehouse by missing some dimensions or by creating redundant dimensions, etc. when you are too focused on an individual business process.

Data Warehouse Design Methodologies


Ralph Kimball's bottom-up approach proposes to create a business matrix which should contain all the common elements (that are used by data marts such as conformed\shared dimension, measures, etc.) defined for the enterprise as whole. With this, the user can design and develop solutions which supports doing analysis across the business processes for cross selling. You can learn more about the matrix here.

Source collected from mssqltips.com

No comments :

Post a Comment