In the previous post we have seen about the Architecture of Datawarehouse at a high level. In this post the Architecture is explained at a detail level.
Fig. (c)
Check out the image above, the image is differenciated in different layers. Let's talk about each layer in detail. Some of the layer in the Architecture are very easy to understand as the names are self-explanatory.
1. Data Source Layer: This is the layer where the raw data is present, treated as Source for the datawarehouse.
This indicates the data source that feed data into the data
warehouse. The data source can be of any format
– flat file, relational database, other types of database, Excel file etc.
Consider a company which is operation since 5 years, and after 5 years they have decided to create a datawarehouse. For last 5 years they must be storing the data in some form, may be the excels, may be some tables etc. Those excel files and tables will become the source for loading the datawarehouse.
So various types of source which we can have are:
a. Excel sheet, flat files,
b. Relational database,
c. Live database,
d. ERP Systems,
d. Legacy Systems
2. Data Extraction layer: This is the layer where the data from the source is extracted and loaded into the next layer, called staging area(we will see more details in next section).
In this layer the data extracted from source can be modified and loaded into Staging area else it can be directly loaded into staging area without any modification.
This phase can be renamed as ETL phase only, as the data extraction and loading will be done using ETL Tools only.
3. Staging Area: It is a common place where all the data is collected before it is being summarized and loaded into the data warehouse.
This is very important stage in a Datawarehouse Architecture.
We will discuss in details in the next post about the Staging Area.
3. ETL Layer: This is the layer where we Extract the data from Staging area(if present), modify/transform/cleanse and load into datawarehouse. We use ETL tools at this layer.
4. Data Storage Layer: This is nothing but the Datawarehouse, where we store the analytical data for analysis.
5. Data Logic Layer: This is the layer where we use the data stored in the Datawarehouse to analyze and generate reports and do many other things. We can write the query, use the reporting tools, use data mining tools as per business needs.
Like many ETL tools, there are lot of reporting tools available in the industry, which have now replaced the old reporting technologies and helping organizations do better and faster reporting.
Some latest reporting tools available in market are:
1. OBIEE
2. Qikview
3. Microstrategy
4. Cognos
5. Tableau
6. Hyperion etc
6. Data Presentation Layer: This is the layer where we create the reports, analysis from the data stored in the datawarehouse. There are various Reporting/Analysis/Mining tools available which helps to do various operations on the data.
7. Source Operation Layer: This layer describes that once we extract the data from source layer we perform various operations on that data.
8. Metadata Layer: Metadata is nothing but the data about the data. So all the layers described above have the Metadata.
This completes the explanation of the Architecture of Data Warehouse.
In the next post we will talk about the Staging Area in detail.
b. Relational database,
c. Live database,
d. ERP Systems,
d. Legacy Systems
e. Web server logs with user browsing data.
f. Internal market research data.
g. Third-party data, such as census data, demographics data,
or survey data.
2. Data Extraction layer: This is the layer where the data from the source is extracted and loaded into the next layer, called staging area(we will see more details in next section).
In this layer the data extracted from source can be modified and loaded into Staging area else it can be directly loaded into staging area without any modification.
This phase can be renamed as ETL phase only, as the data extraction and loading will be done using ETL Tools only.
3. Staging Area: It is a common place where all the data is collected before it is being summarized and loaded into the data warehouse.
This is very important stage in a Datawarehouse Architecture.
We will discuss in details in the next post about the Staging Area.
3. ETL Layer: This is the layer where we Extract the data from Staging area(if present), modify/transform/cleanse and load into datawarehouse. We use ETL tools at this layer.
Earlier this work was done using SQL or Mainframes process which used to take long time to complete and was also difficult to work on.
With the invent of ETL tools we have various tools available in market which makes this task simpler.
Different type of ETL tools available in market are(but not limited to):
1. Informatica,
2. datastage,
3. Abinitio,
4. Pentaho etc
These tools are in lot of demand in market, as these tools help in faster, better and easier processing of the data.
These tools are in lot of demand in market, as these tools help in faster, better and easier processing of the data.
4. Data Storage Layer: This is nothing but the Datawarehouse, where we store the analytical data for analysis.
5. Data Logic Layer: This is the layer where we use the data stored in the Datawarehouse to analyze and generate reports and do many other things. We can write the query, use the reporting tools, use data mining tools as per business needs.
Like many ETL tools, there are lot of reporting tools available in the industry, which have now replaced the old reporting technologies and helping organizations do better and faster reporting.
Some latest reporting tools available in market are:
1. OBIEE
2. Qikview
3. Microstrategy
4. Cognos
5. Tableau
6. Hyperion etc
6. Data Presentation Layer: This is the layer where we create the reports, analysis from the data stored in the datawarehouse. There are various Reporting/Analysis/Mining tools available which helps to do various operations on the data.
7. Source Operation Layer: This layer describes that once we extract the data from source layer we perform various operations on that data.
8. Metadata Layer: Metadata is nothing but the data about the data. So all the layers described above have the Metadata.
This completes the explanation of the Architecture of Data Warehouse.
In the next post we will talk about the Staging Area in detail.