Data Warehouse Solutions
Data
warehouse solutions are typically implemented to provide a central
source of reliable data within an organisation. They normally consist
of four basic components:
1. Data Warehouse
2. ETL Tool
3. Data Cleansing Tool
4. Reporting/Analysis Tool
Often pulling together data from multiple data sources,
a data warehouse aims to provide one version of the truth. All too
often, the same data will be held on multiple databases, each with its
own inaccuracies. Your finance department invariably will produce a
report showing x sales last month, your marketing team will produce a
similar report showing a different figure for sales last month, and
manufacturing will show yet another figure. Before you
implement a data warehouse solution, you must be clear on the goals of
your data warehouse. What is your data warehouse trying to achieve?
Once you have these clear goals written down you can begin to peice
together the component parts of a warehouse.
Data Warehouse DesignDimensional
models are based on a series of inter-related Facts and Dimension
tables. These tables are often referred to as star schemas due to the
occurance of a single fact table with a series dimension tables joined
to it in the form of a star. Additional models include snowflake
schemas which is a more detailled version of your star schema,
consisting of multiple fact tables. Fact
tables typically store measurement data from a particular business
process, e.g. Sales. Facts are normally numeric and additive. This is
important as fact tables can be created based on aggregations, which is
the sum of your additive measures. Dimension tables contain the textural information for the business, for example, customer name, product names, etc. Once these models are identified, a data warehouse can be built using a variety of data warehousing tools.
The Data Warehouse Load Process
All data that populates a data warehouse is normally sourced from
operational or transactional source systems. These are fed into a data
staging area using data warehousing software (ETL software), in a
process known as the extract process. Once
in the data staging area, the data is then cleansed, combined with
other data, and possibly, missing data filled in. This data is then
loaded into the data presentation area and loaded into a dimensional
model. The data warehouse is then populated; normally using an
ETL (Extract, Transform and Load) tool, but can also be done using hand
written code. Data cleansing tools are used in the data staging area
before the resultant data is moved into the presentation area. Once
the data is in the presentation area, reporting tools can be used
either directly, or via a metalayer, to display the data in your
report. An example of a metalayer is a Universe from Business Objects. Additionally,
OLAP cubes can be created from data held in the presentation area for
reporting. These are highly aggregated 'mini-warehouses' often created
to answer specific questions realted to a particular area of a
business. Data warehouses are normally updated overnight due to
the volumes they need to process. However, it is feasible to make them
real time warehouses. To summarise, data warehouse solutions
normally consist of a database created using a relational model of
facts and dimensions, populated using an ETL tool from one or more
operational data sources, cleansed using a data cleansing tool, and
loaded into a presentation layer. Reporting and analysis tools complete
the data warehouse solution by allowing reports and dashboards to
display the information to users.
Data Warehouse Solutions
Business Intelligence Solutions

|