next up previous
Next: Appendices Important Up: A Primer on Probability Previous: Likelihood of Data from

Inferring Probability Distribution from Data -- The Bayesian Approach

In section [*], it was shown how to compute a single estimate for a discrete probability. This estimate could differ from the true value, it is simple the most likely value. Thus, we only know the value of $p$ probabilistically. Bayesian methods provide a way of getting the entire probability distribution for $p$, not just the most likely value. This approach is explained here. This is an advanced topic, and is an area of considerable current research.

Thomas Bayes (1763) asked the inverse question from that discussed in the previous section. In the above we asked, given a probability distribution, what is the likelihood of a set of data. Bayes asked the question, given a set of data, what can be said about the probability distribution from which it came. Formally, if $D$ represents the data and $H$ represents some hypothesis about a probabilistic process which generated this data, the question being asked is what is the probability of the hypothesis given the data $P(H\vert D)$? Bayes rule can be used to convert this to the problem above

\begin{displaymath}P(H\vert D) = \frac{P(D\vert H) P(H)}{P(D)} .\end{displaymath}

The conditional probability $P(D\vert H)$ is the probability of a set of data given a know process $H$ which is often straight-forward to calculate (and was discussed in the previous section). The quantity $P(H)$ is called the prior - it is the probability of the hypothesis independent of the data. The probability of the data $P(D)$ can be thought of as normalisation and calculated from

\begin{displaymath}P(D) = \sum_H P(D\vert H)P(H).\end{displaymath}

You may have heard the pessimistic adage ``dropped toast always lands buttered side down''. Suppose you test this. You drop 10 pieces of toast, under controlled conditions of course, and find that it lands butter side down 3 times out of the ten. What is the probability that a dropped piece of toast lands butter side down?

Clearly, this is a binomial (Bernoulli) process. The probability of the toast landing butter side down is $p$, but you don't know $p$. Bayes formula can be used to calculate the probability distribution or $p$ given this data. A question is, what should we use as the prior? One possibility is to use our prior belief. Perhaps we believe in the absence of data that the expected value should be $1/2$ in which case we could use some distribution which has mean $1/2$ and variance which is a measure of our confidence in this prior belief. Another possibility is assume that all $p$s are equally likely,

\begin{displaymath}P(p) = 1.\end{displaymath}

We will use this here (there are arguments against use of a uniform prior, but we shall ignore them as technicalities). Since it is a binomial process,

\begin{displaymath}P(d\vert p) = {10\choose 3} p^3 (1-p)^{7} .\end{displaymath}

Using Bayes Rule with a uniform prior and normalising, we get a probability distribution for $p$

\begin{displaymath}P(p\vert d) = 11 {10 \choose 3} p^3 (1-p)^7 . \end{displaymath}

In general, if we drop the toast $N$ times and it lands butter side down $n$ time, Bayes' rule with the uniform prior gives

\begin{displaymath}P(p\vert d) = (N+1) {N\choose n} p^n (1-p)^{N-n} . \end{displaymath}

The figure below shows $P(p\vert d)$ for $N = 10, 20, 40, 80, 160, 320,$ and $640$ and $n=0.3N$ (of course in a real experiment, the measured mean would fluctuate even dropping toast was truly binomial with $p=0.3$). For all values of $N$ the distribution is peaked around $p=0.3$, of course. As the number of trials increases, the distribution becomes more and more sharply peaked around this value.

Figure: The probability distribution for $p$ for different sample sizes. As the sample size increases, the distribution becomes increasingly sharp. The maximum likelihood estimate is at the peak of the distribution.
\psfig {figure=/home/jls/teaching/2ndyear/241/lectures/coins.eps} \end{center}\end{figure}

Can we conclude from this data that toast always lands butter side down? From this data, you could infer that $p=0.3$. This is the most likely value; choosing it would a maximum likelihood method. The Bayesian approach gives you a probability distribution for $p$. Thus, we could ask, how likely is it that $p\geq\frac{1}{2}$, that is, how likely is it that dropped toast is more likely to land butter side down. This likelihood could be computed from

\begin{displaymath}P(p\geq 0.5) = \int_{0.5}^1 P(p\vert d) dp.\end{displaymath}

For 10 trials, 3 landing butter side down, this is about 11%. Thus, we would not reject the hypothesis ``Dropped toast is more likely to land butter side down'' at the 95 % level. However, if there were 20 trials of which 6 landed butter side down, $P(p\geq 0.5)$ is about 4 %; with that data we could reject this hypothesis at the 95 % level.

next up previous
Next: Appendices Important Up: A Primer on Probability Previous: Likelihood of Data from
Jon Shapiro