NIST SP 800-61
Handling an Incident
2.4. Incident Analysis
Incident detection and analysis would be easy if every precursor or indicator were guaranteed to be accurate; unfortunately, this is not the case. For example, user-provided indicators such as a complaint of a server being unavailable are often incorrect.
Intrusion detection systems may produce false positives – incorrect indicators. These examples demonstrate what makes incident detection and analysis so difficult: each indicator ideally should be evaluated to determine if it is legitimate. Making matters worse, the total number of indicators may be thousands or millions a day. Finding the real security incidents that occurred out of all the indicators can be a daunting task. Even if an indicator is accurate, it does not necessarily mean that an incident has occurred. Some indicators, such as a server crash or modification of critical files, could happen for several reasons other than a security incident, including human error. Given the occurrence of indicators, however, it is reasonable to suspect that an incident might be occurring and to act accordingly. Determining whether a particular event is actually an incident is sometimes a matter of judgment. It may be necessary to collaborate with other technical and information security personnel to make a decision. In many instances, a situation should be handled the same way regardless of whether it is security related. For example, if an organization is losing Internet connectivity every 12 hours and no one knows the cause, the staff would want to resolve the problem just as quickly and would use the same resources to diagnose the problem, regardless of its cause.
Some incidents are easy to detect, such as an obviously defaced web page. However, many incidents are not associated with such clear symptoms. Small signs such as one change in one system configuration file may be the only indicators that an incident has occurred. In incident handling, detection may be the most difficult task. Incident handlers are responsible for analyzing ambiguous, contradictory, and incomplete symptoms to determine what has happened. Although technical solutions exist that can make detection easier, the best remedy is to build a team of highly experienced and proficient staff members who can analyze the precursors and indicators effectively and efficiently and take appropriate actions. Without a well-trained and capable staff, incident detection and analysis will be conducted inefficiently, and costly mistakes will be made.
The incident response team should work quickly to analyze and validate each incident, following a predefined process and documenting each step taken. When the team believes that an incident has occurred, the team should rapidly perform an initial analysis to determine the incident’s scope, such as which networks, systems, or applications are affected; who or what originated the incident; and how the incident is occurring (e.g., what tools or attack methods are being used, what vulnerabilities are being exploited). The initial analysis should provide enough information for the team to prioritize subsequent activities, such as containment of the incident and deeper analysis of the effects of the incident.
Performing the initial analysis and validation is challenging. The following are recommendations for
making incident analysis easier and more effective:
- Profile Networks and Systems. Profiling is measuring the characteristics of expected activity so that
changes to it can be more easily identified. Examples of profiling are running file integrity checking
software on hosts to derive checksums for critical files and monitoring network bandwidth usage to
determine what the average and peak usage levels are on various days and times. In practice, it is
difficult to detect incidents accurately using most profiling techniques; organizations should use
profiling as one of several detection and analysis techniques.
- Understand Normal Behaviors. Incident response team members should study networks, systems,
and applications to understand what their normal behavior is so that abnormal behavior can be
recognized more easily. No incident handler will have a comprehensive knowledge of all behavior
throughout the environment, but handlers should know which experts could fill in the gaps. One way
to gain this knowledge is through reviewing log entries and security alerts. This may be tedious if
filtering is not used to condense the logs to a reasonable size. As handlers become more familiar with
the logs and alerts, they should be able to focus on unexplained entries, which are usually more
important to investigate. Conducting frequent log reviews should keep the knowledge fresh, and the
analyst should be able to notice trends and changes over time. The reviews also give the analyst an
indication of the reliability of each source.
- Create a Log Retention Policy. Information regarding an incident may be recorded in several places,
such as firewall, IDPS, and application logs. Creating and implementing a log retention policy that
specifies how long log data should be maintained may be extremely helpful in analysis because older
log entries may show reconnaissance activity or previous instances of similar attacks. Another reason
for retaining logs is that incidents may not be discovered until days, weeks, or even months later. The
length of time to maintain log data is dependent on several factors, including the organization’s data
retention policies and the volume of data. See NIST SP 800-92, Guide to Computer Security Log
Management for additional recommendations related to logging.
- Perform Event Correlation. Evidence of an incident may be captured in several logs that each
contain different types of data – a firewall log may have the source IP address that was used, whereas
an application log may contain a username. A network IDPS may detect that an attack was launched
against a particular host, but it may not know if the attack was successful. The analyst may need to
examine the host’s logs to determine that information. Correlating events among multiple indicator
sources can be invaluable in validating whether a particular incident occurred.
- Keep All Host Clocks Synchronized. Protocols such as the Network Time Protocol (NTP)
synchronize clocks among hosts. Event correlation will be more complicated if the devices reporting
events have inconsistent clock settings. From an evidentiary standpoint, it is preferable to have
consistent timestamps in logs – for example, to have three logs that show an attack occurred at
12:07:01 a.m., rather than logs that list the attack as occurring at 12:07:01, 12:10:35, and 11:07:06.
- Maintain and Use a Knowledge Base of Information. The knowledge base should include
information that handlers need for referencing quickly during incident analysis. Although it is
possible to build a knowledge base with a complex structure, a simple approach can be effective. Text
documents, spreadsheets, and relatively simple databases provide effective, flexible, and searchable
mechanisms for sharing data among team members. The knowledge base should also contain a
variety of information, including explanations of the significance and validity of precursors and
indicators, such as IDPS alerts, operating system log entries, and application error codes.
- Use Internet Search Engines for Research. Internet search engines can help analysts find
information on unusual activity. For example, an analyst may see some unusual connection attempts
targeting TCP port 22912. Performing a search on the terms "TCP," "port," and "22912" may return
some hits that contain logs of similar activity or even an explanation of the significance of the port
number. Note that separate workstations should be used for research to minimize the risk to the
organization from conducting these searches.
- Run Packet Sniffers to Collect Additional Data. Sometimes the indicators do not record enough
detail to permit the handler to understand what is occurring. If an incident is occurring over a
network, the fastest way to collect the necessary data may be to have a packet sniffer capture network
traffic. Configuring the sniffer to record traffic that matches specified criteria should keep the volume
of data manageable and minimize the inadvertent capture of other information. Because of privacy
concerns, some organizations may require incident handlers to request and receive permission before
using packet sniffers.
- Filter the Data. There is simply not enough time to review and analyze all the indicators; at
minimum the most suspicious activity should be investigated. One effective strategy is to filter out
categories of indicators that tend to be insignificant. Another filtering strategy is to show only the
categories of indicators that are of the highest significance; however, this approach carries substantial
risk because new malicious activity may not fall into one of the chosen indicator categories.
- Seek Assistance from Others. Occasionally, the team will be unable to determine the full cause and nature of an incident. If the team lacks sufficient information to contain and eradicate the incident, then it should consult with internal resources (e.g., information security staff) and external resources (e.g., US-CERT, other CSIRTs, contractors with incident response expertise). It is important to accurately determine the cause of each incident so that it can be fully contained and the exploited vulnerabilities can be mitigated to prevent similar incidents from occurring.