With the use of databases over the past decades, large volumes of data have been accumulated. To integrate and manage the data effectively and systematically, data warehouses have emerged. In addition, OLAP and data mining, which use the data warehouse, have become important research topics. OLAP allows users to easily analyze the data in the data warehouse in order to acquire information necessary for decision making. Data mining extracts unknown useful knowledge from the data warehouse. Data warehousing is a collection of decision making techniques aimed at enabling the knowledge worker to make better and faster decisions.
Data warehousing techniques can be classified into three categories: data warehouses, OLAP, and data mining. Research issues in the first are data cleaning, data warehouse refreshment, physical and logical design of a data warehouse, and meta data management. Research issues in OLAP are multidimensional data models, OLAP query languages, query processing, and system architectures---ROLAP (Relational OLAP) using relational databases, MOLAP (Multidimensional OLAP) using multidimensional indexes, and HOLAP (Hybrid OLAP) combining ROLAP and MOLAP. Data mining involves various techniques such as association rules, classification, clustering, and similarity search.
The objective of data warehousing is to analyze data from diverse sources to support decision making. To achieve this goal, we face two challenges:
• Poor system performance. A data warehouse usually contains a large volume of data. It is not an easy job to retrieve data quickly from the data warehouse for analysis purposes. For this reason, the data warehouse design uses a special technique called a star schema.
• Difficulties in extracting, transferring, transforming, and loading (ETTL) data from diverse sources into a data warehouse. Data must be cleansed before being used. ETTL has been frequently cited as being responsible for the failures of many data warehousing projects. You would feel the pain if you had ever tried to analyze SAP R/3 data without using SAP BW.
SAP R/3 is an ERP (Enterprise Resources Planning) system that most large companies in the world use to manage their business transactions. Before the introduction of SAP BW in 1997, ETTL of SAP R/3 data into a data warehouse seemed an unthinkable task. This macro-environment explained the urgency with which SAP R/3 customers sought a data warehousing solution. The result is SAP BW from SAP, the developer of SAP R/3.
Here we will discuss the basic concept of data warehousing. We will also discuss what SAP BW (Business Information Warehouse) is, explain why we need it, examine its architecture, and define Business Content.
First, we use sales analysis as an example to introduce the basic concept of data warehousing.
1.1 Sales Analysis—A Business Scenario
Suppose that you are a sales manager, who is responsible for planning and implementing sales strategy. Your tasks include the following:
• Monitoring and forecasting sales demands and pricing trends
• Managing sales objectives and coordinating the sales force and distributors
• Reviewing the sales activities of each representative, office, and region
In the real world, you might have years of data and millions of records.
To succeed in the face of fierce market competition, you need to have a complete and up-to-date picture of your business and your business environment. The challenge lies in making the best use of data in decision support. In decision support, you need to perform many kinds of analysis.
This type of online analytical processing (OLAP) consumes a lot of computer resources because of the size of data. It cannot be carried out on an online transaction processing (OLTP) system, such as a sales management system. Instead, we need a dedicated system, which is the data warehouse.
1.2 Basic Concept of Data Warehousing
A data warehouse is a system with its own database. It draws data from diverse sources and is designed to support query and analysis. To facilitate data retrieval for analytical processing, we use a special database design technique called a star schema.
1.2.1 Star Schema
The concept of a star schema is not new; indeed, it has been used in industry for years. For the data in the previous section, we can create a star schema like that shown in Figure 1.1
Figure 1.1 : Star schema
The star schema derives its name from its graphical representation—that is, it looks like a star. A fact table appears in the middle of the graphic, along with several surrounding dimension tables. The central fact table is usually very large, measured in gigabytes. It is the table from which we retrieve the interesting data. The size of the dimension tables amounts to only 1 to 5 percent of the size of the fact table. Common dimensions are unit and time, which are not shown in Figure 1.1. Foreign keys tie the fact table to the dimension tables. Keep in mind that dimension tables are not required to be normalized and that they can contain redundant data.
As indicated in Table 1.3, the sales organization changes over time. The dimension to which it belongs—sales rep dimension—is called the slowly changing dimension.
The following steps explain how a star schema works to calculate the total quantity sold in the Midwest region:
1. From the sales rep dimension, select all sales rep IDs in the Midwest region.
2. From the fact table, select and summarize all quantity sold by the sales rep IDs of Step 1.
1.2.2 ETTL—Extracting, Transferring, Transforming, and Loading Data
Besides the difference in designing the database, building a data warehouse involves a critical task that does not arise in building an OLTP system: to extract, transfer, transform, and load (ETTL) data from diverse data sources into the data warehouse (Figure 1.2).
Figure 1.2 : ETTL process
In data extraction, we move data out of source systems, such as an SAP R/3 system. The challenge during this step is to identify the right data. A good knowledge of the source systems is absolutely necessary to accomplish this task.
In data transfer, we move a large amount of data regularly from different source systems to the data warehouse. Here the challenges are to plan a realistic schedule and to have reliable and fast networks.
In data transformation, we format data so that it can be represented consistently in the data warehouse. For example, we might need to convert an entity with multiple names (such as AT&T, ATT, or Bell) into an entity with a single name (such as AT&T). The original data might reside in different databases using different data types, or in different file formats in different file systems. Some are case sensitive; others may be case insensitive.
In data loading, we load data into the fact tables correctly and quickly. The challenge at this step is to develop a robust error-handling procedure.
ETTL is a complex and time-consuming task. Any error can jeopardize data quality, which directly affects business decision making. Because of this fact and for other reasons, most data warehousing projects experience difficulties finishing on time or on budget.
To get a feeling for the challenges involved in ETTL, let's study SAP R/3 as an example. SAP R/3 is a leading ERP (Enterprise Resources Planning) system. According to SAP, the SAP R/3 developer, as of October 2000, some 30,000 SAP R/3 systems were installed worldwide that had 10 million users. SAP R/3 includes several modules, such as SD (sales and distribution), MM (materials management), PP (production planning), FI (financial accounting), and HR (human resources). Basically, you can use SAP R/3 to run your entire business.
SAP R/3's rich business functionality leads to a complex database design. In fact, this system has approximately 10,000 database tables. In addition to the complexity of the relations among these tables, the tables and their columns sometimes don't even have explicit English descriptions. For many years, using the SAP R/3 data for business decision support had been a constant problem.
Recognizing this problem, SAP decided to develop a data warehousing solution to help its customers. The result is SAP Business Information Warehouse, or BW. Since the announcement of its launch in June 1997, BW has drawn intense interest. According to SAP, as of October 2000, more than 1000 SAP BW systems were installed worldwide.
Here we will discuss how SAP BW implements the star schema and tackles the ETTL challenges.
1.3 BW — SAP Data Warehousing Solution
BW is an end-to-end data warehousing solution that uses preexisting SAP technologies. BW is built on the Basis 3-tier architecture and coded in the ABAP (Advanced Business Application Programming) language. It uses ALE (Application Link Enabling) and BAPI (Business Application Programming Interface) to link BW with SAP systems and non-SAP systems.
1.3.1 BW Architecture
Figure 1.3 shows the BW architecture at the highest level. This architecture has three layers.
Figure 1.3 : BW architecture
1. The top layer is the reporting environment. It can be BW Business Explorer (BEx) or a third-party reporting tool. BEx consists of two components:
o BEx Analyzer
o BEx Browser
BEx Analyzer is Microsoft Excel with a BW add-in. Its easy-to-use graphical interface allows users to create queries without coding SQL statements. BEx Browser works much like an information center, allowing users to organize and access all kinds of information. Third-party reporting tools connect with BW OLAP Processor through ODBO (OLE DB for OLAP).
2. The middle layer, BW Server, carries out three tasks:
o Administering the BW system
o Storing data
o Retrieving data according to users' requests
We will detail BW Server's components next.
3. The bottom layer consists of source systems, which can be R/3 systems, BW systems, flat files, and other systems. If the source systems are SAP systems, an SAP component called Plug-In must be installed in the source systems. The Plug-In contains extractors. An extractor is a set of ABAP programs, database tables, and other objects that BW uses to extract data from the SAP systems. BW connects with SAP systems (R/3 or BW) and flat files via ALE; it connects with non-SAP systems via BAPI.
The middle-layer BW Server consists of the following components:
o Administrator Workbench, including BW Scheduler and BW Monitor
o Metadata Repository and Metadata Manager
o Staging Engine
o PSA (Persistent Staging Area)
o ODS (Operational Data Store) Objects
o Data Manager
o OLAP Processor
o BDS (Business Document Services)
o User Roles
Administrator Workbench maintains meta-data and all BW objects. It has two components:
• BW Scheduler for scheduling jobs to load data
• BW Monitor for monitoring the status of data loads
Metadata Repository contains information about the data warehouse. Meta-data comprise data about data. Metadata Repository contains two types of meta-data: business-related (for example, definitions and descriptions used for reporting) and technical (for example, structure and mapping rules used for data extraction and transformation). We use Metadata Manager to maintain Metadata Repository.
Staging Engine implements data mapping and transformation. Triggered by BW Scheduler, it sends requests to a source system for data loading. The source system then selects and transfers data into BW.
PSA (Persistent Staging Area) stores data in the original format while being imported from the source system. PSA allows for quality check before the data are loaded into their destinations, such as ODS Objects or InfoCubes.
ODS (Operational Data Store) Objects allow us to build a multilayer structure for operational data reporting. They are not based on the star schema and are used primarily for detail reporting, rather than for dimensional analysis.
InfoCubes are the fact tables and their associated dimension tables in a star schema.
Data Manager maintains data in ODS Objects and InfoCubes and tells the OLAP Processor what data are available for reporting.
OLAP Processor is the analytical processing engine. It retrieves data from the database, and it analyzes and presents those data according to users' requests.
BDS (Business Document Services) stores documents. The documents can appear in various formats, such as Microsoft Word, Excel, PowerPoint, PDF, and HTML. BEx Analyzer saves query results, or MS Excel files, as workbooks in the BDS.
User Roles are a concept used in SAP authorization management. BW organizes BDS documents according to User Roles. Only users assigned to a particular User Role can access the documents associated with that User Role.
1.3.2 BW Business Content
One of the BW's strongest selling points is its Business Content. Business Content contains standard reports and other associated objects. For example, BW provides you, the sales manager, with the following standard reports:
• Quotation success rates per sales area
• Quotation tracking per sales area
• General quotation information per sales area
• Monthly incoming orders and revenue
• Sales values
• Billing documents
• Order, delivery, and sales quantities
• Fulfillment rates
• Credit memos
• Proportion of returns to incoming orders
• Returns per customer
• Quantity and values of returns
• Product analysis
• Product profitability analysis
• Delivery delays per sales area
• Average delivery processing times
Analyses and Comparisons
• Sales/cost analysis
• Top customers
• Distribution channel analysis
• Product profitability analysis
• Weekly deliveries
• Monthly deliveries
• Incoming orders analysis
• Sales figures comparison
• Returns per customer
• Product analysis
• Monthly incoming orders and revenue
Administrative and Management Functions
• Cost center: plan/actual/variance
• Cost center: responsible for orders, projects, and networks
• Order reports
• WBS Element: plan/actual/variance
• Cost center: plan/actual/variance
• Cost center: hit list of actual variances
• Cost center: actual costs per quarter
• Cost center: capacity-related headcount
1.3.3 SAP BW over R/3
R/3 was designed as an OLTP system and not an analytical and reporting system. In fact, depending on your needs you can even get away with a reporting instance .
You can run as many reports as you need from R/3 and web enable them but consider these factors:
1. Performance -- Heavy reporting along with regular OLTP transactions can produce a lot of load both on the R/3 and the database (cpu, memory, disks, etc). Just take a look at the load put on your system during a month end, quarter end, or year end -- now imagine that occurring even more frequently.
2. Data analysis -- BW uses a Data Warehouse and OLAP concepts for storing and analyzing data, where R/3 was designed for transaction processing. With a lot of work you can get the same analysis out of R/3 but most likely would be easier from a BW.
Major benefits of BW include:
1. By offloading ad-hoc and long running queries from production R/3 system to BW system, overall system performance should improve on R/3.
2. Another key performance benefit with BW is the database design. It is designed specifically for query processing, not data updating and OLTP. Within BW, the data structures are designed differently and are much better suited for reporting than R/3 data structures. For example, BW utilizes star schema design which includes fact and dimension tables with bit-mapped indexes. Other important factors include the built-in support for aggregates, database partitioning, more efficient ABAP code by utilizing TRFC processing versus IDOC.
3. Better front-end reporting within BW. Although the BW excel front-end has it's problems, it provides more flexibility and analysis capability than the R/3 reporting screens.
4. BW has ability to pull data from other SAP or non-SAP sources into a consolidated cube.
In summary, BW provides much better performance and stronger data analysis
capabilities than R/3.
1.3.4 BW in mySAP.com
BW is evolving rapidly. Knowing its future helps us plan BW projects and their scopes. Here, we give a brief overview of BW's position in mySAP.com.
mySAP.com is SAP's e-business platform that aims to achieve the collaboration among businesses using the Internet technology. It consists of three components:
• mySAP Technology
• mySAP Services
• mySAP Hosted Solutions
As shown in Figure 1.4, mySAP Technology includes a portal infrastructure for user-centric collaboration, a Web Application Server for providing Web services, and an exchange infrastructure for process-centric collaboration. The portal infrastructure has a component called mySAP Business Intelligence; it is the same BW but is located in the mySAP.com platform. Using mySAP Technology, SAP develops e-business solutions, such as mySAP Supply Chain Management (mySAP SCM), mySAP Customer Relationship Management (mySAP CRM), and mySAP Product Lifecycle Management (mySAP PLM).
Figure 1.4 : mySAP Technology and mySAP Solutions
mySAP Services are the services and support that SAP offers to its customers. They range from business analysis, technology implementation, and training to system support. mySAP Hosted Solutions are the outsourcing services from SAP. With these solutions, customers do not need to maintain physical machines and networks.