## Predicting Data Sample Values

Read this example of a practical, complex, real-world problem that uses recursion in many ways. The underlying math is not testable, but try to get a sense of its subtle recursive nature.

#### Abstract

This document is a summary of the author's dissertation. The result of that work is an approach to detecting unspecified anomalies in unspecified data streams. The underlying technology is spectrum independent and also does not depend on correlated data to achieve accurate detection and extraction in highly robust environments.

The technical approach employs a network of simple building-block equations to predict data sample values and thereby present subtle sub-streams to a detection model as potential events of interest. The prediction model is automatically created from sequential observations of the data stream. Once model construction is complete, it continues to evolve as new samples are taken. Each sample value that is sufficiently different from the model's predicted value is postulated as an event. A subsequent detection model uses a simple set of rules to improve the prediction model's ability to ignore outliers.

We review the theory of the model and its application in two demonstrations: intruder detection in a robust video scene and voice detection in a noisy audio signal. The primary benefit of this technique is its ability to process large data volumes for obscured or buried information within highly active environments. The fully autom ated nature of this technique helps mitigate manning shortfalls typically associated with sorting through large volumes of surveillance data using trained analysts. This approach enables an organization to perform automated cueing for these analysts so that they spend less time examining data where nothing of interest exists. This maximizes the value of skilled personnel by using them to assess data with true potential. In this way, larger data volumes could be processed in a shorter period of time leading to a higher likelihood that important events and signals will be found, analyzed, and acted upon.

Keywords: system identification, prediction, recursion, signal processing, sensor data analysis, digital filtering, basis functions, process monitoring, surveillance, signal filtering

#### 1. Introduction

The increasing use of sensors for surveillance and system monitoring has brought about the problem of analyzing the resulting data volume soon enough for meaningful action. This problem is especia lly prevalent in the case of staring sensors. A staring sensor is one that gathers its data from the same source all the time. Two examples are a microphone mounted in a particular place like a processing plant, meeting room, or battlefield area; and a vi deo camera always aimed at a particular spot. [To say that such a sensor is also "continuous-dwell" means that the sensing mechanism is fixed physically and electronically. There is no "scanning" or "sweeping" involved. This work assumes data streams that originate from staring continuous-dwell sensors.]

Analyzing the vast amounts of surveillance data has become an issue because of the following:

• increases in data sampling rates, resolution, and sensitivity
• the number and types of data streams to capture and analyze is growing
• new events of interest are more difficult to detect and analyze
• expanding data volumes are creating progressive analysis backlogs

Additionally, the author's review of the literature suggests that event detection methods typically make one or more of the following assumptions. An operational scenario may make these assumptions unacceptable.

• the background is benign or static
• the background is uniform
• the data's control law does not change
• the event of interest is known
• the spectrum generating the sensed data is known
• other domain data is correlated

Way and Smith wrote about the growing data volume issue. They noted that NASA's EOS satellite transmits 50 gigabytes of data every hour, all day, every day of the year. Ground stations have to analyze and store this data for any value to be realized. Kanemaru et al spoke about the detection issues relative to visually monitoring power line towers in remote areas. Their piece is an excellent discussion on the difficulties encountered in performing intrusion detection in robust natural scenes. These issues only grow in importance as surveillance sensors proliferate and their use beyond military applications expands the demand for data and the associated analysis workforce.

Addressing these issues is this project's primary purpose. We are striving for the following benefits:

• finding interesting and unexpected events for additional processing or human evaluation (automated cueing)
• reduced need for direct human observation and evaluation of the entire data stream, thus mitigating the manning shortfalls typically associated with analyzing large volumes of surveillance data (automated data sorting)
• enhanced sensor processing and analysis technology that is computationally parallel with strong opportunities for real-time airborne, sea-based, or ground-based implementation (enhanced signal detection)
• a model that adapts automatically as the data stream evolves, reducing the need for absolute sensor calibration and other forms of preprocessing such as range normalization or signal filtering (dynamic signal adaptation)

#### 2. Technical Review

Our goal is to create a fully automated means of anomaly detection in continuous data streams. Full, unattended, automation is one way to deal with manning shortfalls. Another goal is to detect potential events without having to pre-specify the events themselves, the data source, or a signal's underlying characteristics. We also do not want to rely on correlated data. This spatial and spectral independence permits a very generic capability. The work began by examining the seminal results produced by Sanner and Slotine in control theory. From their results we developed an adaptive detection threshold that predicts the next sample in a data stream. When the next sample contains data not predicted, the method postulates the existence of a potential event. A simple rule-based detection model then decides if the detection is really just an outlier.

The prediction model is composed of a network of independent Gaussian radial basis functions. Each data stream has its own model. Each model is built automatically from sequential observations of the incoming samples. As the model is being built, it gradually becomes able to predict the next sample in the sequence. At this point, model construction is completed but it continues to evolve as new samples arrive. Models with a detected event stop evolving until the event is no longer present. In this way, the event does not become part of the model's background predictions, enabling the model to detect future such events. For example, in an image stream, this a pproach allows the event to be tracked across pixels and thus provides a type of moving target indicator. For a microphone or sonar array, the location of the event in the area can be more precisely defined and tracked without visual observation.

#### 2.1 Predictor Overview

The predictor is founded on a summation of Gaussian radial basis functions. The amplitudes of these functions are continuously adjusted to predict the data stream as it evolves. An example is shown in Figures 1a and 1b. Figure 1a shows a network of Gaussian radial basis functions prior to adjustment. Figure 1b shows that same network adjusted to predict the line F(x) = -0.5. As we will see in the next section, the predictor uses an impled recursive process in that the prediction at time T depends on all previous data values, but without storing those values or using a recursive equation. This can be implemented without recursive function calls, as if it were an explicit process.

Figure 1a. Example of original Gaussian shapes

Figure 1b. Gaussians adjusted to predict F(x) = -0.5

#### 2.2 Manipulation of Gaussian Functions

Basis functions are simple-equation building blocks that are a proven means of modeling more complex functions. Brown (in the book edited by Light, p203-206) showed that if D is a compact subset of the k-dimensional region Rk, then every continuous real-valued function on D can be uniformly approximated by linear combinations of radial basis functions with centers in D. Proofs of this type have also been shown by Funahashi; Girosi & Poggio; and Hornik, Stinchcombe, & White. The theory thus having already been well established, the following exposition will concentrate on implementation. The terms "node" and "point" are used interchangeably to refer to a location in the space Rk. Our intended application has k = 1, a one-dimensional value at a given time such that Rk is a line. Look to Sanner and Slotine for an expansion to larger dimensions. This explanation interprets their work for one dimension.

At the top level, complex functions can be approximated using the following summation

$f(x) = \sum_{i=1}^n [c_i g_i (x, \xi_i)]$

f(x) approximates function F(x) at point x in region Rk

ξi is the center or location of node i in region Rk

gi is the nonlinear function used by node i and is centered at ξi

ci is the weight by which the output of gi is multiplied

n is the number of nodes

The idea is to pick a number of points in Rk. Then chose a function that will be used whenever a given point is referenced. Finally, set the output weight for each function, and then add up the results of those functions at input point x. The interesting part, of course, is how to carry out each of those steps.

For this modeling technique, a gaussian radial basis function is used at each point. This function is nonlinear, continuous, and has infinite support.

$g_i(x, \xi_i) = e^{-\pi \sigma_i^2\left \| x- \xi_i \right \|^2}$

$\left \| x- \xi_i \right \|^2 = (x - \xi_i)(x-\xi_i)$

$\sigma_i^2$ is the variance at node i (gaussian width)

Picking the points, ξi, in Rk is a matter of deciding how large the model is to be. For each point, another basis function has to be employed. Note that making the model too large is as ineffective as making it too small. For k = 1, the ξi lie spaced evenly along a line segment formed by the range of expected values of f(x) ∈ [f(x)min, f(x)max] plus a zone ρ = some constant. The distance between points is Δ <= 1/(2β). β > 0 is less than the maximum frequency component (in the FFT sense) used to estimate F(x). Thus, the higher the meaningful frequency content, the more node points are needed. Judgement has to be used here. One does not want to attempt to model the high- frequency "noise", just capture the meaningful parts of the signal. On the other hand, if the scene is truly very robust, a sufficiently large number nodes are needed to resolve the components. We have found that choosing the number of nodes to use is not a delicate affair.

For small k, a good rule of thumb uses $\sigma_i^2 = \pi \beta^2$, making the variance the same constant for all node points. Another $\delta^2 = 1/(8\sigma^2)$. Our empirical approach has been to choose a number of nodes and calculate the other values from there. As our technique improves, our means of choosing the number of nodes should also improve.

The final point is how to initialize and evolve the output weights. Let f(x)t be the function approximation due to sample x received at time t.

$f(x)_t = \sum_{i=1}^{n} [c_{i[t-1]} g_i (x_t,\xi_i)]$

Updating the output weight, ci[t - 1], due to f(x)t yields cit.

$c_{it} = c_{i[t-1]} - K_t \epsilon_{at}g_i(x_t, \xi_i)$

We have found it best to initialize all ci[t = 0] to a constant between 0 and 1, non-inclusive. (This range was chosen via experiment.) Now let εt = (f(x)t – F(x)t). This is the prediction error, the difference between the estimated function result and the actual function result. The estimation is performed before the arrival of the actual result.

$\epsilon_{at} = \epsilon_t - \Phi sat (\epsilon_t / \Phi)$

sat(z) = z if |z| <= 1 and sgn(z) otherwise

sgn(z) = -1 if z < 0 and +1 otherwise

Φ is the minimum expected error

What this formula says is that the network's output weights are not updated if the prediction error is less than the minimum expected error. Intruder's are postulated when the system is in operational mode and the prediction error is sufficiently large.

$K_t = \frac{G}{\sum_{i=1}^{n}g_i(x_t, \xi_i)^2}$

Kt is the adaptation gain. The theory requires G < 2.

Empirically, we have found that G = 0.1 works well. Kt must always be positive.

#### 2.3 Detector Overview

Once the predictor postulates the existence of an event, the detector uses just four simple rules to increase its chances of separating outliers from potential events:

1. the actual sample amplitude departs from that expected when the percent difference between itself and the expected sample crosses a given threshold
2. a certain number of sequential samples' departures cross that threshold
3. the actual sample amplitude subsequently becomes close to that expected by crossing a potentially different threshold than the one used in rule 1, thus ending the departure
4. the number of sequential times the departure subsequently disappears exceeds a number potentially different from the one used in rule 2

#### 3. Two Beginning Success

We performed two experiments to illustrate this idea in a practical setting. The first experiment detected a voice in a noisy background. The second experiment detected an intruder moving in a robust visual scene.

#### 3.1. Experiment with Audio

A microphone was mounted inside the housing of a common box fan. An audio input port commonly found on a personal computer was used to sample the microphone's output at 44100Hz. A voice saying "hello" was then recorded using the same microphone and sampling equipment. Goldwave, an audio mixing tool, was used to reduce the original voice volume by 95% and to raise the fan noise to maximum (without overdriving the mixer). The two signals were then mixed. Examination of the resulting data stream's plot (Figure 2) shows that the voice's location in the stream is not visible. At the loudest inflections, the voice is barely audible. The total signal length is 1.486 seconds. It has 65535 samples. The voice starts at t = 0.473 and lasts 0.480 seconds.

Figure 2a. Fan Noise Mixed With "Hello"

Figure 3b. Fan Noise Filtered

For this particular experiment the model had 15 Gaussian terms. The detection threshold was set at 0.006 deviation of the local squared error from the mean. This deviation had to last for 10 seq uential samples for a detection to be identified. The deviation had to be below 0.006 for 155 sequential samples for the detection to end. Many other combinations also enable the model to perform this type of detection. The model successfully detected the voice mixed into the fan noise. It also had one false alarm lasting 0.069 seconds.

 CATEGORY SECONDS Detection Start 0.605 Detection End 0.898 Detection Length 0.293 Detection Missed 0.187

Given these results, it seems clear that downstream analyses should choose a block of data that has the detected block as a subset. However, the model detected the voice without having any preconceived idea of the spectrum the data was drawn from, nor the underlying characteristics of the event or the background. I t must be said, however, that, in a broad. sense, the model size is based on the expected maximum frequency of the incoming signal as measured by the FFT.

In essence, the adaptive algorithm model functioned as a kind of dynamic squelch filter in this experiment. The key difference is that a normal squelch circuit essentially sets a static threshold level. If a signal of interest is below the squelch threshold level, it would go undetected. The adaptive algorithm model, on the other hand, adjusts its detection threshold dynamically in relation to the predicted data stream.

A signal hidden amongst the noise now becomes detectable because the filter finds unpredicted amplitudes, not just those above a gross amplitude threshold. The conditions that make this detection possible are a result of the background model created by the adaptive algorithm. It can identify subtle differences between its background prediction and the background mixed with a faint signal.

The value of this dynamic threshold capability comes in application domains where continuous monitoring systems are looking for emergency signals or other signals of interest. Some signals currently go undetected because they are too far away or have too little transmitting power for the receivers to discriminate them from the background noise. An adaptive algorithm model would essentially enable such systems to see through more of the background, thus extending the detection sensitivity and the effective detection range.

Figure 4. Comparing Original Scene and Detected Intruder

#### 3.2. Experiment with Video

For the video experiment a color QuickCam was pointed at a ceiling fan that was turning just fast enough to be outside the camera's sampling rate of 6hz. The "sharpness" of the collected image was set to its lowest value. Because of the camera's slow shutter speed relative to the fan's rotation rate, the fan appeared to turn one way and then the other. A string was attached to two opposite walls such that it crossed under the center of the fan's light. A model airplane was attached to the string outside the scene. To get a monochrome pixel the red, green, and blue values delivered by the camera were summed.

For this experiment, each pixel had its own model with 30 Gaussian terms each. The detection threshold was set at 0.095 deviation of the local squared error from the mean. Deviations had to last for 3 sequential frames for a detection to be identified. The deviation had to be below 0.095 for 1 sample for the detec tion to end. Once the model learned to predict the evolving scene, the airplane was pulled through. The model turned on an average of 50% of the pixels associated with the airplane in each frame as the airplane moved through the scene. The model detected the airplane within the first few frames of its arrival and continued the detection until nearly the last frame of its presence. There was no ghost trail left in the filtered scene and no false alarms in pixels with no airplane present. Lighting changes were detected within the scene during the experiment. These changes occurred because the airplane crossed between the fan's light and the camera. Figure 3 compares one frame of the unfiltered and filtered scene as the airplane was being pulled through.

The value implied by this experiment is finding intruders within a dynamic background. There is also the potential for minimizing the impact of camouflage. Note here that the model was able to detect changes in the lighting. These changes were evidence of the intruder's presence without necessarily detecting the intruder itself. Additionally, while this experiment was performed in the visual domain, it is representative of what could possibly be accomplished using sonar grids, synthetic aperture radar, and infrared (for instance).

If an intruder can be detected then it can be tracked. If it can be tracked, there are strong indications from some of our other work that the same model can be used to predict the intruder's trajectory, even if the trajectory is chaot ic. If the trajectory can be predicted then it is easier to vector an interceptor. Thus, sensing can lead to information, information can lead to decisions, and decisions can lead to action. Real-time automated processing of large volumes of sensor data can speed up this process to the point where, for example, patrolling aircraft, sea craft, or land vehicles can be vectored to intercept hard-to-find mobile targets.

#### 4. Notional Concept of Operations

Figure 4 suggests an architecture employing this model. This architecture can be employed on airborne, ground, or sea-based platforms. There is potential for real-time and post-collection scenarios. This architecture fits within a larger concept of operations that includes reaction to detected events.

Figure 5. Adaptive event detection architecture

#### 5. Issues

The adaptive detection method discussed in this paper is an emerging technology. The following issues should be considered:

• Does the sampled environment contain significant random components such that forward prediction of essential elements is not possible or practical using existing methods?
• Does the environment evolve slowly enough that achievable sampling rates are sufficient for an effective background model to be built?
• Can the processing be performed fast enough to enable timely response to detected events?

Additional research is needed to demonstrate the potential of the model for real-time operation. To give a sense of the real-time issues, even relatively simple and slow-speed sensors such as the QuickCam produce up to 30 frames per second. Each frame can be sized at 640x480 30-bit pixels. If each pixel has 25 terms in its model, 7,680,000 equations have to be calculated and adjusted every 33 milliseconds. In its favor, the model scales linearly with the size of the problem, as measured by the number of data streams and the complexity of each data stream. Also, each term in the model is independent of all the others. On the implementation side, there are many means available to accomplish real-time processing for computationally intense applications. Among these are: programmable logic devices (PLDs), Field Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs), and multi-processor systems. Each of these go beyond the processing approach used by common desktop computers. PLDs and FPGAs involve off-shelf components but not traditional processors. ASICs are fabricated for specific applications. The fourth option involves multi-processor systems and/or clusters of systems. The authors continue to investigate these options. Lin took an early hand in this exploration of processing alternatives when he looked at parallel processing for basis function networks. This work was extended by Lin & Raeth. Digital TV runs at much higher resolutions than our experiment and also employs computationally intense processing. So, running our model in real-time should be achievable.

#### 6. Summary

We have found that the Gaussian model is adaptive enough to cope with robust natural data. Not only can the model learn to predict the next sample in an unspecified natural data stream, but it can also detect unspecified events within that data stream. There were three successful experiments that demonstrated applications of this model.

• detecting a voice in ambient background noise
• detecting a model car moving through a benign image stream
• detecting a model airplane moving through a robust image stream

These experimental results have strong implications for automated surveillance via remote or local sensing in many spectral domains. It is important to note that the model has no spectrum dependence. For these experiments, it did not "know" it was processing audio and visual data. Only after detecting an event does a meta-level assumption on domain or spectrum come into play so that the detection results can be displayed in a way appropriate for human understanding and response. Thus, the model might be successfully used in such diverse applications as signal processing, trajectory prediction for intruder interception, machine monitoring, sensor data indexing, and event detection. Another important point is that the model works with raw data and does not require preprocessing or range mapping. In certain real-time sensor applications this is a very important capability.

Given the world's rapidly growing data volume from sensors of all types and the more-slowly growing human analysis capacity, this research provides an important advancement in automated data filtering and operator/analyst cueing. Automated cueing can offset the shortage of trained domain analysts. This method can help an organization maximize their return on investment by helping those analysts focus on data sets with potential signals/events while ignoring large volumes of otherwise useless data.

A second major benefit is the effective increase in detection sensitivity and range. The dynamic squelch effect demonstrated in the audio and visual domains could be very useful in applications where the background changes over time. Signal detection and capture will be more reliable when a system can adjust to these changes and capture signals with low signal to noise ratios.

#### 7. References

[1] J. Way, E.A. Smith, "The Evolution of Synthetic Aperture Radar Systems and their Progression to the EOS SAR", IEEE Transactions on Geoscience and Remote Sensing, v 29, # 6, pp 962-985, 1991.

[2] K. Kanemaru, M. Kaneta, H. Kanoh, T. Nagai, "Image Processing Method for Intruder Detection Around Power Line Towers", IEICE Transactions on Information and Systems, ISSN: 0916-8532, v 76, # 10, pp 1153 – 1161, Oct 1993.

[3] R.M. Sanner, Stable Adaptive Control, Ph.D. Dissertation, Massachusetts Institute of Technology, Doc # AAI0573240, 1993.

[4] R.M. Sanner, J.-J.E. Slotine, "Stable Adaptive Control and Recursive Identification Using Radial Gaussian Networks", IEEE Conf on Decision and Control, Dec 1991.

[5] R.M. Sanner, J.-J.E. Slotine, Gaussian Networks for Direct Adaptive Control, Massachusetts Institute of Technology, Report # NSL-910503/911105, May/Nov 1991.

[6] K. Funahashi, "On the Approximate Realization of Continuous Mappings by Neural Networks", Neural Networks, v 2, pp 183-192, 1989.

[7] W. Light, (ed), Advances in Numerical Analysis, Volume II, Oxford, England: Claredon Press, 1992.

[8] F. Girosi, T. Poggio, "Networks and the Best Approximation Property", Massachusetts Institute of Technology Artificial Intelligence Laboratory, Memo # 1164, Oct 1989.

[9] K. Hornik, M. Stinchcombe, H. White, "Multilayer Feedforward Networks are Universal Approximators", Neural Networks, v 2, pp 359-366, 1989.

[10] P.G. Raeth, R.L. Bostick, D.A. Bertke, "Adaptive Data Mining Applied to Continuous Image Streams", Proc:IEEE/ASME Conference on Artificial Neural Networks in Engineering (ANNIE), Nov 1999.

[11] C.S. Lin, Parallel Processing for Crisp Expert Systems and Basis Function Neural Networks, Cameron Station, VA: National Technical Information Center, Rpt # AFRL-HE-WP-TR-1998-0021, Jan 1997.

[12] C.S. Lin, P.G. Raeth, "Parallel Processing for Real-Time Rule-Based Decision Aids", Proc: IEEE International Conference on Systems, Man, and Cybernetics, Oct 1996.

The author's dissertation may be found at http://InformationAnthology.net/RaethDissertation_Final.pdf

A video demonstration may be found at http://InformationAnthology.net/DissertationVideo.avi

Source: Peter G. Raeth