Inferring Probability Distribution from Data -- The Bayesian Approach

In section , it was shown how to compute a single estimate for a discrete probability. This estimate could differ from the true value, it is simple the most likely value. Thus, we only know the value of probabilistically. Bayesian methods provide a way of getting the entire probability distribution for , not just the most likely value. This approach is explained here. This is an advanced topic, and is an area of considerable current research.

Thomas Bayes (1763) asked the inverse question from that discussed in the
previous section. In the above we asked, given a probability
distribution, what is the likelihood of a set of data. Bayes asked the
question, given a set of data, what can be said about the probability
distribution from which it came. Formally, if represents the data
and represents some hypothesis about a probabilistic process which
generated this data, the question being asked is what is the
probability of the hypothesis given the data ? Bayes rule can
be used to convert this to the problem above

The conditional probability is the probability of a set of data given a know process which is often straight-forward to calculate (and was discussed in the previous section). The quantity is called the

**Example:**- You may have heard the pessimistic adage ``dropped toast
always lands buttered side down''. Suppose you test this. You drop 10
pieces of toast, under controlled conditions of course, and find that
it lands butter side down 3 times out of the ten. What is the
probability that a dropped piece of toast lands butter side down?
Clearly, this is a binomial (Bernoulli) process. The probability of the toast landing butter side down is , but you don't know . Bayes formula can be used to calculate the probability distribution or given this data. A question is, what should we use as the prior? One possibility is to use our prior belief. Perhaps we believe in the absence of data that the expected value should be in which case we could use some distribution which has mean and variance which is a measure of our confidence in this prior belief. Another possibility is assume that all s are equally likely,

We will use this here (there are arguments against use of a uniform prior, but we shall ignore them as technicalities). Since it is a binomial process,

Using Bayes Rule with a uniform prior and normalising, we get a probability distribution for

In general, if we drop the toast times and it lands butter side down time, Bayes' rule with the uniform prior gives

The figure below shows for and and (of course in a real experiment, the measured mean would fluctuate even dropping toast was truly binomial with ). For all values of the distribution is peaked around , of course. As the number of trials increases, the distribution becomes more and more sharply peaked around this value.

**Figure:**The probability distribution for for different sample sizes. As the sample size increases, the distribution becomes increasingly sharp. The maximum likelihood estimate is at the peak of the distribution.Can we conclude from this data that toast always lands butter side down? From this data, you could infer that . This is the most likely value; choosing it would a

*maximum likelihood method*. The Bayesian approach gives you a probability distribution for . Thus, we could ask, how likely is it that , that is, how likely is it that dropped toast is more likely to land butter side down. This likelihood could be computed from

For 10 trials, 3 landing butter side down, this is about 11%. Thus, we would not reject the hypothesis ``Dropped toast is more likely to land butter side down'' at the 95 % level. However, if there were 20 trials of which 6 landed butter side down, is about 4 %; with that data we could reject this hypothesis at the 95 % level.