Data-Oriented Design

This text uses the Martin (1990) version of Information Engineering to illustrate data-oriented design. The result of data-oriented analysis – entity-relationship diagrams, data flow diagrams, CRUD matrices, and so on – is translated into screen designs, production database designs, action diagrams, procedural structures, and security plans. Compared to other approaches, data-oriented design strongly emphasizes security, recovery, and audit controls, relating each to data and processes in the application.

In this chapter, you will learn about the concepts and terminologies for data-oriented design, analyzing data and defining system controls, and the action diagram. The action diagram shows the processing details for an application in a structured format, which can be translated into programs and modules. You will also learn about menu structure, dialogue flow, and hardware and software installation and testing.

Definition of Information Engineering Design Terms

A full list of the activities in IE design is given here; included are references to chapters in which some topics are discussed.

1. Design security, recoverability, and audit controls

2. Design human interface structure

Develop menu structure
Define screen dialogue flow

3. Data analysis

Reconfirm subject area database definition
Denormalize to create physical database design
Conduct distribution analysis and recommend production data distribution strategy

4. Develop an action diagram and conduct reusability analysis

5. Plan hardware and software installation and testing

6. Design conversion from the old to the new method of data storage (Chapter 14)

7. Design and plan application tests (Chapter 17)

8. Design and plan implementation (Chapter 14)

9. Develop, schedule, and conduct training programs for users (Chapter 14)

The topics in this chapter are design of data usage, action diagrams (which are program specs), screen dialogues, security, recovery, audit controls, and installation planning. They are discussed in this section in the order above, by the amount of work involved, and their importance to the application.

The first activity in IE design is to confirm design of the database and determine the optimal data location. Invariably, when the details of processing are mapped to specifications, data usage changes from that originally envisioned. To confirm database design, the data is mapped to application processes in an entity/process (CRUD) matrix and the matrix is reanalyzed. (See Chapter 9 for a more complete discussion of entity/process matrices.) The entity/ process matrix (see Figure 10-1) clusters data together based on processes with data creation authority. The subject area databases defined by the clusters are stored in the same database environment.

The second step of database design is to determine a need to denormalize the data. Recall that normalization is the process of removing anomalies that would cause unwanted data corruption. Denormalizing is the process of designing storage items of data to achieve performance efficiency (see Figure 10-2). Having normalized the data, you know where the anomalies are and can design measures to prevent the problems.

The next activity in data analysis is to determine the location of data when choices are present. A series of objective matrices are developed and analyzed. The matrices identify process by location and data by location and transaction volume. These are used to develop potential designs for distribution of data. The application processes and data are both mapped to locations. Cells of the process/location matrix contain responsibility information, identifying locations with major and minor involvement (see Figure 10-3). This information is used to determine which software would also be required to be distributed, if distribution is selected.

Two data/location matrices are developed. The first data/location matrix identifies data function as either update (i.e., add, change, or delete) or retrieval by location (see Figure 10-4a). The second defines options for data in each location (Figure 10-4b). Together these matrices identify options for distributing data. The options for distributed data are replication, partitioning, subset partitioning, or federation (see Figure 10-5). Replication is the copying of the entire database in two or more locations. Vertical partitioning is the storage of all data for a subset of the tuples ( or records) of a database. Subset partitioning is the storage of a partial set of attributes for the entire database. Federation is the storage of different types of data in each location, some of which might be accessible to network users. The selection of distribution type is determined by the usage of data at each location.

Entities = Processes	Purchase Order	PO Item	Vendor Item	Inventory Item	Vendor
Create & Mail Order	CRUD	CRUD	CRU	R	R
Call Vendor & Inquire on Order	RU	RU	RU	R	R
Verify Receipts against Order	RU	RU	RU		R
Send Invoices to Accountant	RD	RD
File Order Copy by Vendor	R	R
Identify Late & Problem Orders	R	R	R	R	RU
Identify Items & Vendors			R	R	CRU
Call Vendor to Verify Avail/Price			RU		RU

FIGURE 10-1 Example of Entity/Process Matrix

Then, a transaction volume matrix is developed to identify volume of transaction traffic by location. Cells of this matrix contain an average number of transactions for each data relation/process per day (see Figure 10-6). In an active application, hourly or peak activity period estimates of volume might be provided. During matrix analysis, the data and processes are clustered to minimize transmission traffic. Then formulae are applied to the information to determine whether the traffic warrants further consideration of distribution.

Finally, subjective reasons for centralizing or for distributing the application are developed. The subjective arguments ensure that political, organizational, and nonobjective issues are identified and considered. Examples of subjective motivations for centralization/distribution relating to Figures 10-4, 10-5, and 10-6 are in Table 10-1. Recommendations on what, how, and why to distribute (or centralize) data are then developed from the matrices and subjective analysis. The recommendations and reasoning are presented to user and IS managers to accept or modify.

After data are designed, the design of the human interface can begin with a definition of interface requirements. The hierarchy diagram is used to determine the structure of selections needed by the application. A menu structure is a structured diagram translating process alternatives into a hierarchy of options for the automated application (see Figure 10-7). In general, we plan one menu entry for each process hierarchy diagram entry between the top and bottom levels. One level of menus corresponds to one level in the process hierarchy diagram. At the lowest level of the process hierarchy, a process corresponds to either a program or module. Screens at the lowest level are determined by estimating execute units. These functional screens may not be final in menu structure definition because execute unit design is usually a later activity. Once the menu structure is defined, it is given to the human interface designer( s) for use during screen design (Chapter 14).

FIGURE 10-2 Example of Denormalized Data for an Order

Function	Location A	Location B	Location C	Location D	Location E
Purchasing Marketing Customer Service Sales Product Development Research & Dev Manufacturing	\ X X X	X X X X X	\ \ \ \ X	\ X \	\ X
Legend: X – Major Involvement \ – Minor Involvement

FIGURE 10-3 Example of Process/Location Matrix

Data Usage by Location Matrix

Subject Data	Location A	Location B	Location C	Location D	Location E
Prospects	AII-UR	AII-UR
Customer	AII-UR	AII-UR
Customer Orders	AII-UR	Subset-Own Products-UR		AII-R	AII-R
Customer Order History	AII-R	AII-R	AII-R	AII-R
Manufacturing Plans	Subset– own products–R	Subset– own products–R		Subset– own site–UR	AII-UR
Manufacturing Goods Process	Subset– own products–R	Subset– own products–R		Subset– own site–UR	AII-UR
Manufacturing Inventory	Subset– own products–R	Subset– own products–R	AII-R	Subset– own site–UR	AII-UR
U = Update, R = Retrieve

FIGURE 10-40a Example of Data Matrices by Location

Distribution Alternatives by Location

Subject Data	Location A	Location B	Location C	Location D	Location E
Prospects	Replicate-Central Copy	Replicate
Customer	Replicate-Central Copy	Replicate
Customer Orders	Central Copy	Vertical Partition by Product		Access central copy with delay	Access central copy with delay
Customer Order History	Replicate Central Copy	Replicate or access central copy with delay		Access central copy with delay
Manufacturing Plans	Replicate or access central copies with delay	Replicate or access central copies with delay		Subset– own site	Subset– own site with delayed access to D
Manufacturing Goods Process	Access D and E Databases	Access D and E Databases		Subset– own site
Manufacturing Inventory	Subset– own products–R	Subset– own products–R	AII-R	Subset– own site	Subset– own site with delayed access to D

FIGURE lO-4b Example of Data Matrices by Location

The structure is then analyzed further to determine the allowable movement between the options on the menu structure. The dialogue flow diagram documents allowable movement between entries on the menu structure diagram (see Figure 10-8). On the diagram, rows correspond to screens and columns correspond to allowable movements. For instance, in the menu structure example (Figure 10-7), Customer Maintenance has four subprocesses. A dialogue flow diagram shows how Customer Maintenance is activated from the main menu (or elsewhere) and the \options for movement from that level. From the Customer Maintenance menu, the options are to move to the main menu or to one of the four subprocesses. The dialogue flow diagram is used by the designers in developing program specifications, by the human interface designer( s) in defining screens, and by testers in developing interactive test dialogues. Next, procedure design begins with analysis of the process hierarchy and process data flow diagrams developed during IE analysis (Chapter 9). Remember, in analysis, we developed one process data flow diagram (PDFD) for each activity. Now each PDFD is converted into an action diagram. An action diagram shows procedural structure and processing details suitable for automated code generation. An action diagram is drawn with different types of bracket structures to show the hierarchy, relationships, and structured code components of all processes.

FIGURE 10-5 Data Distribution Alternatives

The first-cut action diagram translates the PDFD into gross procedural structures (see Figure 10-9). Then, using detailed knowledge obtained during the information gathering process, the details of each procedure are added to the diagram to develop program specifications (see Figure 10-10). These program specifications may then be packaged into modules that perform one function. Data entities are added to the diagram at the level they are accessed (see Figure 10-11). Progressively more detail about data usage is provided about data attributes. Arrows are attached to show reading and writing of data (see Figure 10-12). When the details are completely specified, the action diagram is mapped to procedural templates to determine the extent to which reusable modules can be used in the application, and the changes to the action diagrams required to define modules for reuse. A procedural template is a general, fill-in-the-blanks guide for completing a frequently performed process. For instance, error processing and screen processing can be defined as reusable templates (see Figure 10-13). A data template is a partial definition of an ERD or database that is consistent within a user community. For example, the insurance industry has common data requirements for policy holders, third party insurance carriers, and policy information; most companies have similar accounting data needs, and so on. To be a candidate for template definition, a process must do exactly the same actions whenever it is invoked, and data must be consistent across users.

Subject Database
Location/Function	Prospect	Customer	Customer Order	Customer History	Mftg. Plan	Mftg. WIP	Mftg. Inven.
A
Customer Service		100 R 20 U	250 R 400 U	5 R	2 R	2 R	2 R
Sales	50 R 20 U	50 R 30 U	150 R 50 U	50 R	2 R	2 R	15 R
Marketing	15 R	5 R	10 R	50 R	2 R		1 R
B
Customer Service		250 R 50 U	250 R 400 U	50 R	250 R	250 R	250 R
Sales	25 R 20 U	25 R 5U	10 R 100 U	70 R	2R	2R	15 R
Marketing	20 R	10 R	10 R	50 R		2 R	5 R
D
Manufacturing					50 R 5 U	50 R 250 U	500 R 2,000 U
E
Manufacturing					100 R 15 U	200 R 2,500 U	500 R 25,000 U
Legend: U = Create, Update or Delete; R = Retrieve

FIGURE 10-6 Example of Transaction Volume Matrix

After reusability analysis, the action diagram set is finalized and used to generate code. If the application is specified manually, the action diagrams are given as program specifications to programmers who begin coding. If the application uses a CASE tool, automatic code generation is possible. A code generator is a program that reads specifications and creates code in some target language, such as Cobol or C. If the application uses a code generator, the action diagram contains the symbols and procedural detail specific to the code generation software. If the application uses a 4GL, the action diagram might contain actual code. If manual programming uses a 3GL or lower, the action diagram contains pseudo-code consisting of structured programming constructs. The next activity in IE design is to develop security plans, recovery procedures, and audit controls for the application. Each of these designs restrict the application to performing its activities in prescribed ways. The goal of security plans is to protect corporate IT assets against corruption, illegal or unwanted access, damage, or theft. Security plans can address physical plant, data, or application assets, all by restricting access in some way. Physical security deals with access to computers, LAN servers, pes, disk drives, cables, and other components of the network tying computer devices together. Data security restricts access to and functions against data (e.g., read, write, or read/write). Application security restricts program code from access and modification by unauthorized users. Examples of the results of security precautions are locking of equipment, requirement of user passwords, or assignment of a software librarian for program changes.

TABLE 10-1 Example of Subjective Reasons for Centralization and Distribution

General Measure-Argument
D	Geographic distribution by function by product makes centralization difficult
D	Centralized mainframe in a sixth location is not close to distributed sites, nor interested in serving their needs
d	Little product overlap between sites A and B
Location A Measure-Argument
d	General Manager in Location A – smallest needs
d	GM wants 'what is best' for division
C	Little technical expertise in the location; would increase travel expense required to support hardware/software
Location B Measure-Argument
C	Customer service needs fast response to fulfill corporate objectives (90% of requests serviced within one phone call, less than three minutes)
C	Most application expertise in division is located here
C	IS manager, located here, wants the applications and data under his control

Location C Measure-Argument
d	Actions mostly independent of other sites
d	Delays in retrieval of information could be tolerated
Location D Measure-Argument
d	Historically, location controls its own hardware/software
d	Hardware/software not currently compatible with A, B, or C
Location E Measure-Argument
d	Historically, location controls its own hardware/software
d	Historically, software has been successfully developed/bought as joint activity with IS group in
Legend: D/e = d/e =	Strong argument for Distribution/Centralization Weak argument for distribution/centralization

FIGURE 10-7 Menu Structure Example

Recovery procedures define the method of restoring prior versions of a database or application software after a problem has destroyed some or all of it. Recovery is from a copy of the item. Backup is the process of making extra copies of data to ensure recoverability.· Disasters considered in the plan include user error, hacker change, software failure, DBMS failure, hardware failure, and location failure. Recovery is the process of restoring a previous version of data (or software) from a backup copy to active use following some damage to, or loss of, the previously active copy. The backup/recovery strategy should be designed to provide for the six types of errors above. Several backup options add requirements to program design that need to be accommodated.

FIGURE 10-8 Dialogue Flow Diagram Example

FIGURE 10-10 Action Diagram with Create Purchase Order Process Detail

Next, audit controls are designed to prove transaction processing in compliance with legal, fiduciary, or stakeholder responsibilities. Audit controls usually entail the recording of day, time, person, and function for all access and modification to data in the application. In addition, special totals, transaction traces, or other special requirements might be applied to provide process audit controls.

Last, hardware installation is planned and implemented, if required for the application. Again, there is no theory or research about hardware installation, but long practice has given us guidelines on the activities and their timing.