In much of this course, we will be concerned with systems which process or classify images or patterns. Why is probability theory an appropriate language for describing such systems? There are several reasons for this. First, in some instances a problem might truly have a random component. An example might be image classification in which some noise somewhere in the image processing system might change the values of some of the features in a random manner. If we want to classify a very noisy image, we may not be able to say with certainty what the original image was; we might be able to express the probability of possible images. Our classification of the image will also be uncertain, so it will be described probabilistically.

More often than not, however, probability is used because we have incomplete information. We use probability to model our ignorance. As an example of this, consider handwritten character recognition. The interpretation of each character might not be random; the writer might always use the same symbol to mean the same thing. However, if we do not know what that is, we have to guess. More formally, we imagine that if we do experiments with the same amount of uncertainty in our information, we would get different results.

An additional source of uncertainty comes with systems which learn from examples. The typical situation is one where learning is done from a tiny fraction of the the possible patterns which the system may see in practice. Thus, the system must learn about the universe from a small sample of examples. This sampling introduces a number of uncertainties which can be expressed probabilistically. For example, it will not be possible to infer precisely the performance of such a system from the number of errors on a small sample, but one can derive the probability of a given error rate. The uncertainties caused by sampling will be quite important in the performance and validation of learning systems.

Learning from examples can be seen as using examples to estimate the probability of classifications. This view is taken in the first part of this course. We will need to express probabilistic ideas mathematically, and require some of the rules for manipulating probabilities. In what follows, I outline basic probability theory.