Problem
In my last couple of tips, I talked about the importance of a Business Intelligence solution, why it is becoming priority for
executives, what a typical Business Intelligence system architecture looks like, etc. In this tip, I going to talk in detail
about how a data warehouse is different from operational data store and the different design methodologies for a data warehouse.
Solution
This tip is going to cover Data Warehouses (DW,
sometime also called
an Enterprise Data Warehouse or EDW), how it differs from Operational
Data Store (ODS) and different Data Warehouse design methodologies.
Enterprise Data Warehouse (EDW or DW) Vs. Operational Data Store (ODS)
The purpose of the Data Warehouse in the overall Business
Intelligence Architecture is to integrate corporate data from different
heterogeneous data sources in order to facilitate historical and trend
analysis reporting. It acts as a central repository and contains the
"single version of truth" for the organization that has been carefully
constructed from data stored in disparate internal and external
operational databases\systems. For better performance, mostly data in
data warehouse will be in de-normalized form which can be categorized in
either star or snowflake schemas (more on this in the next tip).
The purpose of the Operation Data Store (ODS) is to integrate
corporate data from different heterogeneous data sources in order to
facilitate real time or near real time operational reporting. Often data
in
the ODS will be in structured similar to the source systems, although
during integration it can involve data cleansing, de-duplication and can
apply business rules to ensure data integrity. An ODS is mainly
intended to integrate data quite frequently at
the lowest granular level for operational reporting in a close to real
time data integration scenario. Normally,
an ODS will not be optimized for historical and trend analysis on huge
set of data.
Let's summarize the differences between an ODS and DW:
- An ODS is meant for operational reporting and supports current or near real-time reporting requirements whereas a DW is meant for historical and trend analysis reporting on a large volume of data
- An ODS is targeted for low granular queries whereas a DW is used for complex queries against summary-level or on aggregated data
- An ODS provides information for operational, tactical decisions about current or near real-time data acquisition whereas a DW delivers feedback for strategic decisions leading to overall system improvements
- In an ODS the frequency of data load could be hourly or daily whereas in an DW the frequency of data loads could be daily, weekly, monthly or quarterly
Data Warehouse Design Methodologies
There are two different methodologies normally followed when designing a Data Warehouse solution and based on
the requirements of your project you can choose which one suits your particular scenario. These methodologies are
a result of research from Bill
Inmon and Ralph Kimball.
Bill Inmon - Top-down Data Warehouse Design Approach
Bill Inmon is sometimes also referred to as the "father of data warehousing"; his design methodology is based on
a top-down approach and defines data warehouse in these terms
- Subject oriented - The data in a data warehouse is categorized on the basis of the subject area and hence it is “subject oriented”.
- Integrated - Data gets integrated from different disparate data sources and hence universal naming conventions, measurements, classifications and so on used in the data warehouse. The data warehouse provides an enterprise consolidated view of data and therefore it is designated as an integrated solution.
- Non-volatile - Once the data is integrated\loaded into the data warehouse it can only be read. Users cannot make changes to the data and this practice makes the data non-volatile.
- Time Variant - Finally data is stored for long periods of time quantified in years and has a date and timestamp and therefore it is described as “time variant”.
Bill Inmon saw a need to integrate data from different OLTP systems
into a centralized repository (called
a data warehouse) with a so called top-down approach. Bill Inmon
envisions a data warehouse at center of the "Corporate Information
Factory" (CIF), which provides a logical framework for delivering
business intelligence (BI), business analytics and business management
capabilities.
This top-down design provides a highly consistent dimensional view of
data across data marts as all data marts are loaded from the
centralized repository (Data Warehouse).
The top-down design has also proven to be flexible to support business
changes as it looks
at the organization as whole, not at each function or business process
of the
organization. Generating a new dimensional data marts against the data
stored in
the data warehouse is a relatively simple task.
Though there are some
challenges
for the top-down approach, for example it represents a very large
project with a very broad scope and hence the up-front cost for
implementing a data warehouse using the top-down methodology is
significant.
Further, the duration of time from the start of project to the point
that end users start experience initial benefits of the solution can be
substantial. Also, the top-down methodology can be inflexible and
unresponsive to changing departmental or business process needs (a
concern for today's dynamically changing environment) during the
implementation phase.
Ralph Kimball - Bottom-up Data Warehouse Design Approach
Ralph Kimball is a renowned author on the subject of data
warehousing. His design methodology is called dimensional modeling or
the Kimball methodology. This methodology focuses on a bottom-up
approach, emphasizing the value of the data warehouse to the users as
quickly as possible. In his vision, a data warehouse is the copy of the
transactional data specifically structured for analytical querying and
reporting in order to support
the decision support system.
As per his methodology, data marts are
first
created to provide reporting and analytical capabilities for specific
business\functional processes and later on these data marts can
eventually be
unioned together to create a comprehensive enterprise data warehouse.
The bottom-up approach focuses on each business process at one point of
time
so the return on investment could be as quick as first data mart gets
created. Though if not carefully planned, you might lack the big picture
of
the enterprise data warehouse by missing some dimensions or by creating
redundant dimensions, etc. when you are too focused on an individual
business process.
Ralph Kimball's bottom-up approach proposes to create a business
matrix which should contain all the common elements (that are used by
data marts such as conformed\shared dimension, measures, etc.) defined
for the enterprise as whole. With this, the user can design and develop
solutions which supports doing analysis across the business processes
for cross selling.
Source Collected from MSSQLTIPS.COm
Hello,
ReplyDeleteAs we know that, the dimension table data will not change frequently. But after some time the dimension data also be changes, during that time we need to capture the changes and update into the target dimension table, this is known as SCD.Thanks for sharing valuable information.