Data Gathering for Application Development

There are many different data dimensions, such as time and volume. Each dimension is important in defining the requirements of applications. Read this section, which discusses these data types. Bear in mind how much and what type of information should be collected.


Each phase of application development requires interaction between the developers and users to obtain information of interest at the time. Each phase seeks to answer broad questions about the application. For instance, in feasibility analysis, the questions are broad and general: What is the scope of the problem? What is the best way to automate? Can the company afford (not) to develop this application? Is the company able to support application development? 

In analysis we seek what information about the application. For instance, What data are required? What processes should be performed and what are the details of their performance? What screen design should be used? 

In design, we develop how information relating to the application. For example, How does the application translate into the specific hardware environment selected? How does the logical data design translate into a physical database design? How do the program modules fit together? 

The kind of interaction that elicits answers to questions such as these differs by information type and phase. In this section we describe the alternatives for obtaining information to be used for application development. The alternative data gathering techniques are described, then related to application types. Then, ethical considerations in data collection and user relations are discussed.


Data differs on several important dimensions: time orientation, structure, completeness, ambiguity, semantics, and volume. Each of these dimensions is important in defining requirements of applications because they give guidance to the SE about how much and what type of information should be collected. Also, different data types are related to different application types and require different requirements elicitation techniques. Inattention to data dimensions will cause errors in analysis and design that are costly to fix. Error correction cost is an increasing function of the phase of development (see Table 4-1). 

In addition to obtaining information, we also use the techniques for validating the information and interpretation in the proposed application. Use of validation techniques during each phase increases the likelihood that logic flaws and misinterpretations will be found early in the development.

TABLE 4-1 Cost of Error Correction by Phase of Develoment

Phase in Which Errors are Found Cost Ration to Fix the Error
Feasibility Analysis 1
Design 3 - 6
Code/Unit Test 10
Development Test 14 - 40
Acceptance Test 30 - 70
Operation 40 - 1000

Time Orientation 

Time orientation of data refers to past, present, or future requirements of a proposed application. Past data, for example, might describe how the job has changed over time, how politics have affected the task, its location in the organization, and the task. Past information is exact, complete (if maintained), and accurate. There is little guessing or uncertainty about historical records. 

Current information is information about what is happening now, and its relevance in determining the future. For instance, current application information relates to operations of the company, the number of orders taken in a day, or the amount of goods produced. Current policies, procedures, business industry requirements, legal requirements, or other constraints on the task are also of interest in application development. Current information should be documented in some way that it can be read by the development team to increase their knowledge of the application and problem domains.

Future requirements relate to changes in the industry expected to take place. They are inexact and difficult to verify. Economic forecasts, sales trend projections, and business 'guru' prognostications are examples of future information. Future-oriented information might be used, for example, by managers in an executive information system (EIS).


Structure of information refers to the extent to which the information can be classified in some way. Structure can refer to function, environment, or form of data or processes. Information varies from unstructured to structured with interpretation and definition of structure left to the individual SE. The information structuring process is one in which the SE is giving a form and definition to data. 

Structure is important because the wrong application will be developed without it. For instance, knowing that the user envisions the structure of the system to be one with 'no bureaucracy,' minimal user requirements, and no frills, gives you, the SE, a good sense that only required functions and data should be developed. In the absence of structuring information, technicians have a tendency to develop applications with all 'the bells and whistles' so the users can never complain that they don't have some function.

An example of structuring of data is shown in Figures 4-1 and 4-2. When you begin collecting information about employees for a personnel application, you might get information about the employees themselves, their dependents, skills the employees might have, job history information, company position history, salary history, and performance reviews. 

The information comes to you in pieces that may not have an obvious structure, but you know that all of the data relates to an employee so there must be relationships somewhere. In Figure 4-2, we have structured the information to show how all of the information relates to an employee and each other in a hierarchic manner. Each employee has specific one-time information that applies only to them, for instance, name, address, social security number, employee ID, and so on. In addition, each employee might have zero to any number of the other types of information depending on how many other companies they have worked at, whether they have children, and how long they have worked at the company. The most complex part of the data structure is the relationship between position, salary, and reviews. If salary and performance reviews are disjoint, they would be as shown, related to a given position the person held in the company (see Figure 4-2). The other option is that salary changes are dependent on performance reviews and the hierarchy would be extended another level.

FIGURE 4-1     Unstructured Personnel Data

FIGURE 4-2   Structured Personnel Data


Information varies in completeness, the extent to which all desired information is present. Each application type has a requisite level of data completeness with which it deals. Transaction processing systems deal with complete and accurate information. GDSS and DSS deal with less complete information. EIS, expert systems, or other AI applications have the highest levels of incompleteness with which they must cope. 

In applications dealing with incomplete information, the challenge to you is to decide when the information is complete enough to be useful. Sometimes this decision is made by the user, other times it is made within the application and there need to be rules defining complete enough. 


Ambiguity is a property of data such that it is vague in meaning or is subject to multiple meanings. Since ambiguity deals with meaning, it is closely related to semantics. An example of ambiguity is to ask the following query:


In this query, New York can mean New York State or New York City; both answers would be correct. Obvious problems will occur to a person who asks that request for one context (the state) and gets an answer for the other context (the city). Contextual cues help SEs to define the one correct interpretation of ambiguous items; further problems arise because of multiple semantic interpretations within a single context. For that reason, semantics is discussed next. 


Semantics is the study of development and change in the meaning of words. In business applications, semantics is the meaning attached to words. Meaning is a social construction; that is, the people in the organization have a collectively shared definition of how some term, policy, or action is really interpreted. 

Semantics is important in applications development and in the applications themselves. If people use the same terms, but have different meanings for the terms, misunderstandings and miscommunications are assured. If embedded in an application, semantically ambiguous data can never be processed by a program without the user being aware of which 'meaning' is in the data. Applications that have semantically mixed data then rely on the training and longevity of employees for proper interpretation of the data. If these key employees leave, the ability to correctly interpret the meaning of the data is lost. Losing the meaning of information can be expensive to the company and can result in lawsuits due to improper handling of information.

An example of semantic problems can be seen in a large insurance company. The company uses the term 'institution' to refer to its major clients for retirement funds. The problem is that 'institution' means different things to different people in the company. In one meeting, specifically convened to define 'institution,' 17 definitions surfaced. The problem with semantic differences is not that 16 of the 17 definitions are wrong. The problem is that all 17 definitions are right, depending on the context of their use. It is the SEs job to unravel the spaghetti of such definitions to get at the real meaning of terms that are not well defined at the corporate level. Unraveling the meaning of the term 'institution' took about 20 person-months over a two-year period to get the user community to reach consensus on the corporate definition of the term 'institution.'


Volume is the number of business events the system must cope with in some period. The volume of new or changed customers is estimated on a monthly or annual basis whereas the volume of transactions for business operation is usually measured in volume per day or hour, and peak volume. Peak volume is the number of transactions or business events to be processed during the busiest period. The peak period might be annual and last several months, as with tax preparation. The peak might be measured in seconds and minutes, for example, to meet a Federal Reserve Bank closing deadline. 

Volume of data is a source of complexity because the amount of time required to process a single transaction can become critical to having adequate response time when processing large volumes. Interactive, on-line applications can be simple or extremely complex simply because of volume. For instance, the ABC rental application will actually process less than 1,000 transactions per day. Contrast this volume with a credit card validation application that might service 50,000 credit check requests per hour. Credit card validation is simple processing; servicing 50,000 transactions per hour is complex.

Applications that mix on-line and batch processing using software that requires the two types of processes to be distinct, requires careful attention to the amount of time necessary to accommodate the volumes for both types of processing. For instance, the personnel application at a large oil company was designed for 20 hours of on-line processing with global access, and four hours of batch reporting. When the system went 'live,' the on-line processing worked like a charm because it had been tested, retested, and overtested. The batch portion, for which individual program tests had been conducted, required about 18 hours because of the volume of processing. After several weeks, the users were fed up because printed reports had been defined as the means of distributing query results, and they had none. The solution required an additional expenditure of over $200,000 to redevelop all reports as pseudo-on-line tasks that could run while the interactive processes were running. Simple attention to the volume of work for batch processing would have identified this problem long before it cost $200,000 to fix.


There are seven techniques we use for data gathering during application development. They are interviews, group meetings, observation, temporary job assignment, questionnaires, review of internal and outside documents, and review of software. Each has a use for which it is best served, and each has limitations to the amount and type of information that can be got from the technique. The technique strengths and weaknesses are summarized in Table 4-2, which is referenced throughout this section. 

In general, you always want to validate the information received from any source through triangulation. Triangulation is obtaining the same information from multiple sources. You might ask the same question in several interviews, compare questionnaire responses to each item, or check in-house and external documents for similar information. When a discrepancy is found, you reverify it with the original and triangulated sources as much as possible. If the information is critical to the application being correctly developed, put the definitions, explanations, or other information in writing and have it approved by the users separately from the other documentation. Next, we discuss each data collection technique.

Single Interview 

An interview is a gathering of a small number of people for a fixed period and with a specific purpose. Interviews with one or two users at a time are the most popular method of requirements elicitation. In an interview, questions are varied to obtain specific or general answers. You can get at people's feelings, motivations, and attitudes toward other departments, the management, the application, or any other entity of interest (see Table 4-2). Types of interviews are determined by the type of information desired. 

Interviews should always be conducted such that both participants feel satisfied with the results. This means that there are steps that lead to good interviews, and that inattention to one or more steps is likely to result in a poor interview. The steps are summarized in Table 4-3. Meeting at the convenience of the interviewee sets a tone of cooperation. Being prepared means both knowing who you are interviewing so you don't make any embarrassing statements and having the first few questions prepared, even if you don't know all the questions.

Source: Sue Conger,
Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 License.

Last modified: Tuesday, June 8, 2021, 7:05 PM