Data Gathering Techniques for Each Application Type
Read this section and identify the data gathering techniques most useful for each application type.
Software Review
Frequently, applications are replacing older software that supports the work of user departments. Study of the existing software provides you with information about the current work procedures and the extent to which they are constrained by the software design. This, in turn, gives you information about questions to raise with the users, for instance, how much do they want work constrained by the application? If they could remove the constraints, how would they do the work?
The weaknesses of getting information from software review are that documentation might not be accurate or current, code might not be readable, and the time might be wasted if the application is being discarded.
To summarize, the methods of collecting information relating to applications include interviews, group meetings, observation, questionnaires, temporary job assignment, document review, or software review. For obtaining information relating to requirements
for applications, interviews and JAD meetings are the most common.
DATA COLLECTION AND APPLICATION TYPE
In this section, we identify the data gathering techniques most useful for each application type. Like most aspects of application development, the techniques can be used for all application types, but because of their strengths and weaknesses, they do
not always result in the type of information that is needed most. In this section, we first match data collection techniques to the data types discussed in the first section. Then, the data types are matched to application types (from Chapter 1).
Next, we match the data collection techniques to application types based on the data types they have in common.
Data Collection Technique and Data Type
Table 4-7 summarizes the discussion of the above sections. By matching technique for data collection to data type, we are more likely to identify information of interest than using other techniques. As the table shows, interviews and meetings are useful
for eliciting all types of information. This is the reason they are most frequently used in application work.
Observation provides only crude numerical estimates of volumes, and is restricted to current time, varying ambiguity, and possibly variable semantics (see Table 4-7). Because the information from an observation is unstructured, some skill is required of the SE to impose a structure on it that fits the situation. Also, the information may be incomplete.
Questionnaires can ask structured questions about any time frame but only obtain complete answers for questions asked (see Table 4-7). If the questions are open-ended, the completeness might be quite low. Ambiguity in questionnaires should be low, but
the question semantics might be misinterpreted by the respondents. Questions about volume at a department or organization level are usually inappropriate. Information about the volume of transactions or time for transaction processing for individual
workers would get meaningful information.
TABLE 4-7 Data Collection Techniques and Data Type
Technique | Time | Structure | Completeness | Ambiguity | Semantics | Volume |
Interview | All | All | All | All | Varies | All |
Meeting | All | All | All | All | Varies | All |
Observation | Current | Unstruct. | Incomplete | May vary | Varies | Crude measure |
Questionnaire | All | Structured | Complete for questions asked | Low | Fixed but might be subject to interpretation | Individual volumes only |
Temporary job assignment | Current | Unstruct. | Incomplete | Low-med. | Varies | For period of observation but may not be representative |
Internal documents | Past-current | Unstruct. | Incomplete | Low-med. | Varies | Maybe |
External documents | Mostly current-future | Unstruct | Incomplete | Low-med. | Relatively fixed | N/A |
Software review | Past-current | Structured | Complete for software | Low-med. | Fixed | Maybe |
Temporary job assignments are similar to observation in having a high degree of uncertainty associated with the information obtained (see Table 4-7). The information tends to be current, unstructured, and incomplete depending on the period of work. Ambiguity varies from low to medium depending on how well-defined and structured the work is. Semantic content might vary depending on the shared definitions in the work group.
Documents provide unstructured, incomplete information from which no relevant volume information is likely. The time orientation differs whether the documents are internal or external to the company (see Table 4-7). Internal documents are mostly oriented
to the past or current situation. External documents are mostly oriented to current or future topics. The semantics of external documents on mature technologies or topics tend to be relatively fixed while that of internal documents might vary by department
or division.
Software provides past, and possibly current, information that is structured because it is automated. The ambiguity should be low to medium, and semantics should be fixed since the application imbeds definitions of data and processes in code. Information
on volumes may be present but should be cross-checked using other methods.
Data Type and Application Type
Application types are transaction processing (TPS),
query, decision support (DSS), group decision support (GDSS), executive information (EIS), and expert systems (ES). Each of these has one or more predominant datatype characteristics that identifies
its application. Table 4-8 shows all applications
categorized for all data types. Here we discuss only the data types that differentiate between application
types.
TABLE 4-8 Data Type by Application Type
Technique | Time | Structure | Completeness | Ambiguity | Semantics | Volume |
TPS | Current | Structured | Complete | Low | Fixed | Any |
Query | Past, current | Structured | Complete | Low | Fixed | Any |
DSS | All | Structured | Varies | Low-med. | Varies | Med-high |
GDSS | Current- future | Unstruct. | Incomplete | Medium-high | Varies | Low |
EIS | Future | Unstruct. | Incomplete | Medium-high | Varies | Low-med |
Expert system | Current based on past | Semi-structured | Incomplete | Medium-high | May vary | Low |
TPS contain predominantly known, current,
structured, complete information (see Table 4-8).
Recall that TPS are the operational applications of a
company. To control and maintain records of current operations, you must have known, structured,
current, and complete information.
Query applications have similar characteristics
to TPS with the difference that they might concentrate on historical information in addition to current
information (see Table 4-8). Queries are questions
posed of data to find problems and solutions, and
to analyze, summarize, and report on data. To perform summaries and reports with confidence, the
data must be structured, complete, and interpreted
consistently being both unambiguous and of fixed
semantics.
DSS are statistical analysis tools that allow development of information that aids the decision process. The type of data that identifies DSS so that all time frames might be represented, may be incomplete, ambiguous, have variable semantics and medium to high volume (see Table 4-8). DSS might be used, for instance, in analyzing which of two variations on a given product might enjoy the larger market share. To do this analysis, past sales, current sales, and sales trends in the industry might all be analyzed and tied together to develop an answer. GDSS are meeting facilitation tools for groups of people.
GDSS tools operate in a structured manner working on data that is unstructured, current, and
future-oriented. GDSS mostly deal with data that is
incomplete and contains semantic and other ambiguities (see Table 4-8). The tools themselves are complete, unambiguous, and so forth, but the meeting
information they process is not.
EIS are future-oriented applications that allow
executives to scan the environment and identify
trends, economic changes, or other industry activity
that affect their governance of a company. EIS deal
mostly with 'messy' data that is unstructured,
incomplete, ambiguous, and contains variable
semantics (see Table 4-8). Interpretation is always a
problem with such data, which is why executives
who excel at reading the environment are highly
compensated.
Last, expert systems manage and reason through
semistructured, incomplete, ambiguous, and variable
semantic data (see Table 4-8). Experts and ESs take
random, unstructured information and impose a
structure on it. They reason through how to interpret the data to remove ambiguity and to fix the
semantics. Therefore, even though the data coming
into the application might have these fuzzy characteristics, the data processing is actually highly
structured.
TABLE 4-9 Data Collection Technique and Application Type
TPS | Query | DSS | GDSS | EIS | ES | |
Interview | X* | X | X | X | X | X |
Meeting | X | X | X | X | X | X |
Observation | X | X | X | Limited | Limited | X |
Questionnaire | X | X | X | |||
Temporary job assignment | X | X | X | |||
Internal documents | X | X | X | Limited | ||
External documents |
X | X | X | X | X | X |
Software review | X | X | X | Limited | Limited | Limited |
*Boldface identifies most frequently used method.
Data Collection Technique and Application Type
Finally, in discussing different data types, we desire to know which data collection techniques are best for each application type. By combining the information in Tables 4-7 and 4-8, we develop Table 4-9 to summarize data collection techniques for each
application type. The table entry in boldface shows the principle method of data collection for each technique.
TPS and query applications can profit from the use of all techniques. Meetings and interviews predominate because they elicit the broadest range of responses in the shortest time (see Table 4-9). Observation and temporary job assignment are particularly useful in obtaining background information about the current problem domain, but need to be used with caution so as not to prejudice the design of the application. Questionnaires are useful when the number of people to be interviewed is over 50. Also, questionnaires are useful in identifying characteristics of users that determine, for instance, training required of users during organizational feasibility analysis. Also, if the screen requires, for instance, colors or different types of screen arrangements, questionnaires might be useful for presenting a small set of alternatives from which the actual users choose. DSS also are shown as having a use for all data collection techniques, but not all techniques are practical in all cases (see Table 4-9).
DSS are generally developed for use by people in jobs with a significant amount of discretion in what they do and how they do it. Therefore, observing or working with one or two people as representative may result in a biased view of the application requirements
for a general purpose DSS. Even for a custom DSS, observation and job assignments might both be impractical if the SE does not know enough about the job being supported to interpret what she or he observes. The same holds true of documents. Documents,
such as statistical reports, might be useful for providing samples of the types of analyses desired in a DSS. Other documents, such as policies, procedures, and so on, are not likely to be relevant to the application. For general purpose DSS with
a large number of users, questionnaires are a useful way to identify the range of problems and analysis techniques required in the DSS. This information might be followed by interviews or meetings to determine DSS details.
GDSS are usually custom-built suites of software packages that provide different types of support for automated meetings. As such, the SE working on a GDSS environment needs to know the types of issues, number of participants, as well as types of reasoning
and group consensus techniques desired. GDSS components are neither common knowledge nor frequently used; you might build one GDSS in a career. Therefore, significant time would be spent finding out about the market,vendors, and GDSS components. External
documents on vendor products are useful in developing questions that elicit the required information. After knowledge of the market is obtained, interviews and meetings are useful to determine the specific requirements and to review, with users, what
the GDSS can and cannot do. Other methods might have some limited value. For instance, observation of an actual meeting that might be automated would be useful for the SE to gain insight about how a tool might work. Internal documents that provide
information about meetings that the GDSS is expected to provide would also be useful. Both of these techniques, observation and document review, have a specific limited role in providing the information needed to build a GDSS. Any software review
that is done would be a review of other company's GDSS facilities or of vendor products, rather than review of in-house software.
EIS are similar to GDSS in the rarity and general lack of knowledge about what an EIS is. EIS are not standard applications with a screen for data entry of some type and reports that are displayed. EIS are information presentation facilities that can
be structured with menus and selection tools, but may display document pages, newspaper articles, book abstracts, summary reports, and so on. EIS are usually built for a small number of users, which eliminates the use of questionnaires. EIS are custom
and one-of-a-kind environments for which past documents or software will be of limited value. Observation is most likely limited because executives would be uncomfortable in being observed. Temporary job assignment is not possible because you cannot
just 'be an executive' for a week or two. This leaves external documents, interviews, and meetings as the most likely techniques for data collection (see Table 4-9). As with GDSS, external documents will mostly be used to identify the market, vendors,
and products. Interviews are most likely to be used to determine executives' information needs and preferred delivery platforms.
Finally, SEs use interviews, observation, and external documents the most in developing expert systems (see Table 4-9). Experts frequently can talk about external aspects of their jobs, the physical cues they use as inputs, and the result of their reasoning
and how it is applied to the business. They are just as frequently unable to discuss their reasoning processes and how they put the cues together to make sense of unstructured situations. Experts, by definition of the term expert, have so internalized
their work that they just do it. They don't think consciously about how they are doing what they do. Therefore, observation, in particular, the use of protocol analysis, is useful in getting information the expert might not be able to articulate.
Protocol analysis is time-consuming and indefinite because you, the SE, are inferring a reasoning process from actions taken. At best, the protocol analysis gives you questions to ask about the work that assist the experts in discussing aspects of
work they ordinarily cannot. Thus, observation is interleaved with interviews to discuss what is observed. As the process continues, structure is imposed on both the data and the problems to begin to develop the ES. The process of obtaining an expert's
reasoning processes is called knowledge elicitation. The process of structuring the unstructured data and reasoning information is called knowledge engineering. Knowledge engineering is an activity that is difficult to learn and requires training
through an apprenticeship approach in which the trainee works with an expert knowledge engineer.
Source: Sue Conger, https://resources.saylor.org/CS/CS302/OER/The_New_Software_Engineering.pdf
This work is licensed under a Creative Commons Attribution 3.0 License.