Chapter 2: DEVELOPING DATA WAREHOUSE

 CHAPTER 1

https://datawarehousestudywithchanda1179.blogspot.com/2022/12/chapter-1-introduction.html

CHAPTER 3


CHAPTER 2 

TOPIC COVER:  Developing data warehouse.

CONTENTS 

(i) Data warehouse components

(ii) Steps and crucial decisions for design and construction of data warehouse

(iii) Three-Tier data ware Architecture


(i) Data warehouse Components

  • It consists mainly four parts: -
  1. Data Sources
  2. ETL Processing / Staging area
  3. Dataware house / Presentation area
  4. Data Access / End user


Let's understand in detail: -

1. Source system: - 

  • It is system that contain data to provide a data warehouse.
  • This may include operational database and other internal or external systems. 
口  OPERATIONAL DATABASE: - 

  • It refers to a system that is used to process the day-to-day transactions of an organization.
  • It includes hardware & software must be up to date in order to run smoothly.
EXAMPLE: - AIRPORT OPERATIONAL SYSTEMS (AOS)

  • These systems control an airport's day to day operations.
  • Agent uses them to check-in & process baggage & also maintain the system.

FINANCIAL: -

  • It refers to essential pieces of information that help to provide clear picture of financial well-being such as: - BALANCE SHEET, CAHS FLOW etc.

MRP: -

  • It stands for material Requirements planning.
  • It designs to plan manufacturing production.
  • It identifies necessary materials, estimate quantities, determine when materials will be required to meet production.

CUSTOMER SERVICES: -

  • The process of capturing & analyzing data from customers.
  • Data comes from all points in customer relationship like: - Messages, Purchases, Survey feedback, returns & demographics.

HUMAN RESOURCES: -

  • It approaches to managing employees at work.
  • It includes people analytics, workforce analytics or talent analytics. 
  • It enables more improve workforce well-being & organization performance. 

OTHER'S: -
  • It is non- operational database.

ㅁSUPPORT DBMS: - 
  • Collection in organized manner for storage, updating, accessing & recovering a data.
  • Day to day processing & transaction of data is done in database.
ㅁEXTERNAL SOURCES: -
  • It defined as any data not contained in company's operating system.
  • It can be data that company has but is not in operating system.
EXAMPLE: -
                      BUDGETING DATA   may keep in spreadsheet in P.C. & not be include in company's accounting system. 

2. STAGING AREA: -
  • The data comes from different sources.
  • Staging area used to clean, transform, combine, household, archive and prepare source data for use in data warehouse.
  • Data need not be based on relational terminology.
  • Sometime data managers are comfortable with normalized set of data.
  • Staging does not provide querying or presentation services.
TRANSFORMATION: -
  • Process of converting data from one format to another.
  • From the format of source system into the required format of destination system.
EXAMPLE: - Unsorted files to Sorted files.

3. PRESENTATION AREA: - 
  • It represents the data which is stored in data warehouse.
META DATA: -
  • Its mean data about data.
  • Meta data acts as table of contents for data in data warehouse. ex: - INDEX of BOOK.
  • It created & maintained as documents.
  • It tracks of all changes that happen in source system.


4. END-USER ACCESS: -
  • Main purpose is to provide information to the business manager for strategic decision making.
DATA-MINING: -
  • Mining is generally considered as the process of extracting useful data from large set of data.
SUMMARY OF DATA WAREHOUSE COMPONENTS: - 

Show in diagram



          
(ii) STEPS AND CRUCIAL DECISIONS FOR DESIGN AND CONSTRUCTION OF DATA WAREHOUSE

回 STEPS FOR DESIGN AND CONSTRUCTION OF DATA WAREHOUSE 

  • A Data warehouse designing is a complex task as it required efficient skills to manage business technology & entire programs.

1. BUSINESS SKILLS: -

    (i) Business skills are needed to enhance business productivity since it quickly and efficiently gathers information, which accurate describe the organization.
     (ii) It is necessary to provide solution of all the questions. (Given below)
  • How system works.
  • How data managed & stored.
  • Process of transferring data from source to data warehouse.
  • How data ware is maintain & update using refresh tools.
  • Translate business requirement in queries.
Designer need business skills before designing the data warehouse, so that every main function is cover in data warehouse. 

2. TECHNICAL SKILLS: -
  • It finds trend & explore tools.
  • It helps in describing facts from historic information & making decision using quantitative information.
3. MANAGEMENT SKILLS: -
  • It helps in connecting with end-users.
  • Data warehouse design of data effective & timely way.
FOUR Different views considered during data warehouse design: -

1. Top-Down View
2. Data-source View
3.Data warehouse View
4. Business query View

1. TOP-Down View: -
  • It selects needed information for design data warehouse such as business needs.
2. DATA Source View: -
  • It displays the manipulated & updated information by operational system.
  • The information can be viewed at various levels of details such as (i) Individual data tables, (ii) Integrated data tables.
  • Traditional data modelling techniques applied such as ER-Relationship.
3. DATA Warehouse View: -
  • It shows fact tables & dimension tables.
  • It represents the detail about historical information present in data warehouse.
4. Business Query View: -
  • Data displayed by query entered by end-user in data ware is created here.
ロSTEPS TO DESIGN: -


Two major software development methods used: -
  1. WATERFALL MODEL
  2. SPIRAL MODEL
1. WATERFALL MODEL: -
  • It performs each step in systematic manner, complete & move on next step.
  • Approach is long term projects.
2. SPIRAL MODEL: - 
  • It is rapid & incremental model.
  • It used during short & rapid release.
Steps to perform while designing data warehouse: -

1. Business Process: -
  • It consists of objects, collection of objects.
  • Depend on business customer.
  • An appropriate Data mart building chosen. 
2. Facts Tables: -
  • Data represent in fact tables is chosen.
  • Atomic level of data such as daily transactions are shown in fact tables.
3. Dimensions: -
  • The Dimensions which will be applied to fact tables records are chosen.
  • Dimensions such as Product, Customer, Location etc. are chosen.
4. Measures: -
  • Measures such as total sales, total products are chosen.
  • The fact tables are populated using measures.
5. D.W.H. Implementation: -
  • It consists huge volume of data and response time demanded is in order of seconds.
  • Dataware system support highly efficient data cube computation techniques.
  • Techniques: -(i) Bitmap index (ii) Join Index 
  • Data warehouse is difficult task, goal should be cleared defined & achievable.
  • Goals refers to budget & time, data sources, department types should be cleared.
6. D.W.H. Deployment: -
  • D.W.H. is designing next step to follow data warehouse deployment.
  • The installation, training, induction, upgrade & maintenance tasks are considered.
7. D.W.H. Administration: -
  • DWH are controlling the range of dimensions, DWH size, resources budget, queries etc.
  • Manage the DWH such as data sources, data refreshing, recovery planning, access controls, security measures, organize database etc.
ロ Crucial decisions in designing a Data warehouse: -
  • Design & implement DWH is very challenging & difficult one, there is lot of focus & implementation.
  • Designers ask for Top management.
  • Where from you start?  what should be bought up first? which management needs are most important? does design depend on details implement? how do you scope the project down to something manageable?
ALL THESE QUESTIONS ARE CLOSE TO CRUCIAL DECISION.
  • Designers want to do simple & achievable.
  • Through design challenges, to achieve goal easily need methods to follow: -
  1. Choosing the subject matter.
  2. Deciding what s fact table represents.
  3. Identifying & conforming the dimensions.
  4. Choosing the facts.
  5. Storing pre-calculations in the fact table.
  6. Rounding out the dimension table.
  7. Choosing durations of database.
  8. Need to track slowly challenging dimensions.
  9. Deciding query priorities & query modes.
  • ALL the above steps are required before the data warehouse is implemented.
  • Approach should be simple to complex.
DATA WAREHOUSE Designer follow these steps: -

1. Choosing subject matter of particular data mart: -
  • The most cost-effective highly demanded subject should be taken-up first.
  •  'HOT SUBJECTS' should be given high priority. 
2. Deciding exactly what a fact table record represents: -
  • Fact table is large central table in dimensional design that has a multipart key.
  • Once fact table representation is decided, a co-horrent discussion of what the dimensions of data marts fact table are take place.
3. Identifying and confirming the dimensions: -
  • Dimensions are drivers of data marts.
  • Dimensions are source of raw headers in users report, user carry vocabulary to the users.
  • EXAMPLE: - 

VARIOUS TECHNOLOGY CONSIDERATIONS: -
  • No. of technology issue considered.
(i) Hardware Platforms: - Balanced Approaches
(ii) DBMS for Support Data warehouse: - environment & relationship
(iii) Communication & Network: - communicate or network  
(iv) System Management / Operating System: - Analysis of business 
(v) Software tools for building: - Access tools 


(iii) THREE-TIER DATA WAREHOUSE ARCHITECTURE: -
  • It is data representation method that integrate all data from extract sources.
  •  It is data representation method that integrate all data from extract sources.
  • It is used in order to build a data warehouse by include the required data schema model. OLAP server type, front-end tools for reporting/ analysis. which contain three- tire's that produce linked with another. 
  • Data Warehouse have 3 tiers. 
  1. Bottom-tier
  2. Middle tier
  3. Top-tier 



DATA SOURCES: -
  • All the data selected to any business organization.
  • The data can be in any of these formats: - Plain text file, relational database, excel file and other types of data that act as data sources. 

(i) Production data: -
  • Represent sales data, HR data & Product data.
(ii) Internal data: -
  • Represents data of departed or an organization such as employee data.
(iii) External data: -
  • Represent data from outside the organization or third-party data such as demographics or survey data.
(iv) Archieved data: -
  • Represent logs of the web server along with users browsing data.
EXAMPLE OF DATA SOURCES: - Sales, Transmission, Employee etc.


ロ OPERATIONAL DATABASES: -
  • It used to process the day-to-day transactions of organization.
  • It includes software & hardware must update in order to run smoothly.
1. BOTTOM-TIER: -
  • It refers to the storage data for data processing where data comes before being transformed into data that entered in data warehouse.
  • Transporting data from various source to data warehouse using back-end tools& utilities.


(i) Extraction: -
  • It means the act of taking something out.
  • example: - Extraction of salt from sea water.
  • It refers to the process of collecting or retrieving data from variety of sources.
(ii) Clean: -
  • It is major part of ETL process.
  • It refers to data cleansing deals with detecting & removing errors & inconsistencies from data in order to improve quality of data.
(iv) Data Transformation: -
  • It refers to qualitative data transfer.
  • It used to converted to match destination system.

EXAMPLE: -
                      D, A, B, E, C = A,B,C,D,E

(v) Load: -
  • It refers to the process of packing up your data & moving it to destinated data warehouse.has
  • It is transitory phase where move data source to destination for future process.
(vi) Refresh: -
  • It refers to update data.
BOTTOM-TIER CONTAINS: -










ETL: -
  • It stand for Extract Transform Load.
  • It is process of taking large volume of data from source, modifying & restructuring it for reporting & analytics purpose & load into data warehouse.
(i) Monitoring & Administration: -
  • Data  refreshment
  • Data synchronous
  • Recovery
  • Managing data
(ii) Meta data Repository: -
  • Meta data means data about data.
  • It contain structure data name & definition source of extract data.
(iii) DATA WAREHOUSE: -
  • It is central repository where information is coming from ETL process to analyse data to make more informed decisions.
  • Data contain subject-oriented based.
(iv) DATA MARTS: -
  • It is small slice of data warehouse.
  • It is focused on particular subject of business like: - department, finance, sales, marketing etc.
Two types of data marts: - 1. Dependent 2. independent

2. MIDDLE-TIER: -
  • It contains O.L.A.P. Server.
  • It refers to where transformed data and cleaned data is stored.
(i) O.L.A.P. SERVER: -
  • Stands for online analytical processing that refers to set of software tools used for data analysis in order to make business decisions.
  • Main purpose of O.L.A.P. is these are intermediate servers which stand in between relational back-end server & user front-end tools.
  • It is software for performing multi-dimensional analyse at high speed on large amount of data.
  • O.L.A.P. is implement using two models: - 1. R.O.L.A.P. MODEL 2. M.O.L.A.P. MODEL.
1. R.O.L.A.P. Model: - 
  • It stands for Relational O.L.A.P. based on relational DBMS.
  • Present data in Relational table.
2. M.O.L.A.P. Model: - 
  • It stands for multi-dimensional O.L.A.P. based on multi-dimensional d.b.m.s.
  • Represent data in data cubes.
3. TOP-TIER: -
  • It is front-end client layer.
  • It is also known as information delivery.
  • Provide information that reached to end users.
  • The information can be in any form such as- tables, charts, graphs, or histograms.
  • It contains: - 1. Query/ Report. 2. Analysis 3. Data Marts 
1. Query/ Report: - 

       QUERY: -
  • S.Q.L. used in customer mailing list.
  • It gives you an answer to a simple question, perform calculations, combine data from different tables, add, change or delete data from database.
  • Design for save time.
EXAMPLE: - Big-query is offered by google.
        
       REPORT: - 
  • Process of collecting and arranging data from a varied no. of sources.    
  1. Production report tools
  2. Report writers
2. Analysis tools: -
  • Prepare character/ meaningful data is based on analysis.
  • It is used for better decision-making.
3. Data mining: -
  • It refers to process of finding relevant & useful information large amount of data.
  • It takes full- knowledge of D.W.H.
  • Discover hidden piece of information useful patterns.
SCHEMA'S OF DATA WAREHOUSE 

Schema: -
  • It means how the data in data warehouse.
  • It has a logical data description of the entire data-ware: - collection of database objects, table, view, indexes & synonyms.
  • It also required to maintain the schema.
  • It consists of two types of tables.







1. FACT TABLE: -
  • It has generally store of the key.
  • It is part of database as well attribute.
  • It contains factual & quantitative data.
2. DIMENSIONAL TABLE: -
  • It stores all the subject area in details.
  • It is the one of the sets of tables.
  • It contains column, column is primary key, that helps to uniquely identify every dimension row or record.
TYPES OF SCHEMAS IN DATA WAREHOUSE










1. STAR SCHEMA: -
  • It is combination of fact table & dimensional table.
  • It's fact keys are normalize but dimensional keys are not normalized.



















2. SNOWFLAKE SCHEMA: -
  • It call as normalize.
  • It is multi-dimensional data model.
  • It is an extension of star schema, where dimension tables are  broken down into sub-dimensions.
  • To manage size of BI.












3. FACT CONSTELLATION: -
  • It is a familiar of star.
  • It is complicated design & implementation.
  • It consists of more than one star schema at a time.
  • It also known as a galaxy schema.





Comments