Datawarehouse - The Architecture - Explained !!!

In this post we will see about the architecture of Datawarehouse.

Before moving ahead - What is ARCHITECTURE?

In simple words we can say - What are things required to build ANYTHING, and what is that ANYTHING be used for.

So in terms of Datawarehousing - Architecture is - What all steps are required to build datawarehouse and what are datawarehousing used for.

As seen in above image, To build datawarehouse we require certain details and after the datawarehouse is loaded we can use it for various use.

So the basic items in the above image i.e, to build a datawarehouse are mentioned below:
1. Source: This is a basic level of information which is required to create your datawarehouse.
The source is a place where you have stored your data prior to build a datawarehouse. Usually companies in the initial days of the business, stores the data in tables or the files.

In the example of the boutique which we saw in earlier posts, we discussed about storing the data in excel spreadsheet etc.

So various types of source which we can have are: Excel sheet, flat files, relational database, live database etc.

2. ETL (Extract - Transform - Load) - This is the step where we extract the data from source and transform the data in consistent form and load into Datawarehouse tables.

Note: We have seen the meaning of consistent in earlier post.

3. Datawarehouse - This is the storage part. As discussed in earlier post, datawarehouse in combination of multiple tables linked to each other through primary keys and foreign keys. For more details see earlier post.

So Datawarehouse store the data in systematic manner so that we can utilize for future business requirements.

4. Analysis/Queries - The data stored in the datawarehouse is used for analysis so that the business strategies can be made. The historical data stored in the datawarehouse serves the purpose for providing quality analysis and reporting.

Also the reports are generated using the data present in the datawarehouse. There are various reporting tools which are available in market which helps easily create the reports and present in a better format.

5. Presentation/Reports - Using the Reporting tools/Analysis tools/Data Mining Tools etc we can use the data in the Datawarehouse for generating reports, doing analysis and many other stuff. 

6. Metadata - Metadata is called as data about the data. Every thing apart from actual data is metadata. 

Consider a EMPLOYEE oracle table with 1000 records.

Those 1000 records are the data.

Metadata will be: 

 a. Its an ORACLE table,
 b. Table size is 1 GB,
 c. Table have EMPLOYEE ID, EMP_NAME, EMP_AGE, EMP_LOCATION etc as columns etc.

So as said all the data apart from actual data is METADATA. 

Metadata exist for all the section in a Datawarehouse.

This is about the introduction to the data warehouse.

Let's talk about the Architecture of Datawarehouse in layer wise form:

Check the image below:

Each component is discussed individually below:
Data Source Layer
This indicates the data source that feed data into the data warehouse. The data source can be of any format – flat file, relational database, other types of database, Excel file etc.
Some examples are listed below:
·Operations -- such as sales data, HR data, product data, inventory data, marketing data, systems data.
·Web server logs with user browsing data.
·Internal market research data.
·Third-party data, such as census data, demographics data, or survey data.
All these data sources together form the Data Source Layer.
Data Extraction Layer
The Data is extracted from the data source into the data warehouse system. There is likely some minimal data cleansing, but there is unlikely any major data transformation.
Staging Area
This is where data sits prior to being scrubbed and transformed into a data warehouse / data mart. Having one common area makes it easier for subsequent data processing / integration.
ETL Layer
The data is transformed in this layer. This is where data gains its "intelligence", as logic is applied to transform the data from a transactional nature to an analytical nature. This layer is also where data cleansing happens.
Data Storage Layer(Target)
This is where the transformed and cleansed data sit. Based on scope and functionality, 3 types of entities can be found here: data warehouse, data mart, and operational data store (ODS). In any given system, you may have just one of the three, two of the three, or all three types.
Data Logic Layer
This is where business rules are stored. Business rules stored here do not affect the underlying data transformation rules, but does affect what the report looks like.
Data Presentation Layer
This refers to the information that reaches the users. This can be in a form of a tabular / graphical report in a browser, an emailed report that gets automatically generated and sent everyday, or an alert that warns users of exceptions, among others.
Metadata Layer
This is where information about the data stored in the data warehouse system is stored. A logical data model would be an example of something that's in the metadata layer.
System Operations Layer
This layer includes information on how the data warehouse system operates, such as ETL job status, system performance, and user access history.

No comments:

Post a Comment