At the heart of all pattern search or classification problems (either explicitly or implicitly) lies Bayes' Decision Theory. Bayes' decision simply says, given an input observation of unknown classification, make the decision that will minimize the probability of a classification error. For example, in this unit, you will be introduced to the k-nearest neighbor algorithm. It can be demonstrated that this algorithm can make Bayes' decision. Read this chapter to familiarize yourself with Bayes' decision.
Introduction
Pattern classification is to classify some object into one of the given categories called classes.
For a specific pattern classification problem, a classifier is computer software. It is developed
so that objects () are classified correctly with reasonably good accuracy. Through training
using input-output pairs, classifiers acquire decision functions that classify an input datum
into one of the given classes (
). In pattern recognition applications we rarely if ever have the
prior probability
or the class-conditional density
. of complete knowledge
about the probabilistic structure of the problem. In a typical case, we merely have some vague,
general knowledge about the situation, together with a number of design samples or training
data - particular representatives of the patterns we want to classify training. The problem,
then, is to find some way to use this information to design or data train the classifier.
The organization of this chapter is to address those cases where a great deal of information
about the models is known and to move toward problems where the form of the distributions
is unknown, and even the category membership of training patterns is unknown. We begin in
Bayes decision theory (Sec.2) by considering the ideal case in which the probability structure
underlying the categories is known perfectly. In Sec.3 (Maximum Likelihood), we address the
case when the full probability structure underlying the categories is not known, but the
general forms of their distributions are the models. Thus the uncertainty about a probability
distribution is represented by the values of some unknown parameters, and we seek to
determine these parameters to attain the best categorization. In Sec.4 (Nonparametric
techniques) we move yet further from the Bayesian ideal and assume that we have no prior
parameterized knowledge about the underlying probability structure; in essence, our
classification will be based on information provided by training samples alone. Classic
techniques such as the nearest-neighbor algorithm and potential functions play an important
role here. We then in Sec.5 (Support Vector Machine) Next, in Sec.6 (Nonlinear Discriminants
and Neural Networks), we see how some of the ideas from such linear discriminants can be
extended to a class of very powerful algorithms such as backpropagation and others for
multilayer neural networks; these neural techniques have a range of useful properties that
have made them a mainstay in contemporary pattern recognition research. In Sec.7 (Stochastic
Methods), we discuss simulated annealing by the Boltzmann learning algorithm and other
stochastic methods. We explore the behavior of such algorithms with regard to the matter of
local minima that can plague other neural methods. Sec.8 (Unsupervised Learning and Clustering), by addressing the case when input training patterns are not labeled and that our
recognizer must determine the cluster structure.
Source: Yizhang Guan, https://www.intechopen.com/chapters/10687 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.