In the above sections, the problem considered was to estimate the probability from some data. A related problem is the testing of hypotheses -- that is testing whether some data is consistent with the believed or proposed distribution. Of course, in many cases all data is consistent with the hypothesis, but may be extremely unlikely. The classical approach to this problem was developed by Fisher (1935). First pick a statistic which is a measurement of the data which can determine how likely the data is. Second pick a threshold percentage, usually taken to be 95%, and accept the hypothesis if the statistic on that data is within the range which would occur that percentage of the time.
For example, suppose someone flips a coin 20 times and it comes up
heads 4 times, tails 16 times. Is this a funny coin? You might think
so, but how would you test this? You expect 10 heads, but you know
this will fluctuate. You take as your hypothesis: the coin is a fair
coin; the probability of heads is 50%. This is the so-called
null hypothesis. Then you ask, is the deviation
from the expected number of heads greater than the deviation which you
would expect to find 95% of the time? In essence, you considered the
confidence interval around the assumed value. If the measured
value falls within that, the null hypothesis is accepted; if the
measured value falls outside this confidence interval, the null
hypothesis is rejected. The
distribution is binomial, so you can look this up on the curves
above. Or you could compute it yourself from properties of binomials. What you would find is
that the probability of getting expected value, 10 heads in 20 trials
is 18%. The probability of getting within 1 of the expected value,
that is 9 heads, 10, heads or 11 heads, is 50%. Table below shows the
the percentage of the probability of which is within of the
expected value either way for 20 trials of a binomial process with
.
![]() |
allowed # of heads | percent in this interval |
0 | 10 | 18% |
1 | 9,10,11 | 50% |
2 | 8,9,10,11,12 | 74% |
3 | 7,8,9,10,11,12,13 | 88% |
4 | 6,7,8,9,10,11,12,13,14 | 96% |
5 | 5,6,7,8,9,10,11,12,13,14,15 | 98.8% |
6 | 4,5,6,7,8,9,10,11,12,13,14,15,16, | 99.7% |
7 | 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17 | 99.96% |
8 | 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18 | 99.996% |
9 | 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 | 99.9998% |
10 | 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,20 | 100% |
We see that we would not accept the coin as a fair coin at the 95% level.
Classical hypothesis testing is very important in science, and the subject fills books. The procedure is as above, but with several complications. There are different types of hypotheses to test - one-sided versus two-sided, comparisons of two distributions, and others. There are different distributions which used in as the hypothesised process. There are special statistics which come from known distributions in the limit of large data sets for very general processes. It is also worth noting that classical hypothesis testing is problematic in that a given hypothesis can be accepted or rejected with the same data by different procedures or statistics.