next up previous
Next: Application to Validation and Up: Estimating Discrete Probabilities Previous: Estimating Discrete Probabilities

The Uncertainty in the Estimate of the Discrete Probability

There are several was to compute how good an estimate this is. First, we can put an error bar on the value. The standard way of computing the size of the error of an experimental measurement is use what is called the standard error in the sample mean, which is the standard deviation of the mean. This is the measured standard deviation divided by the square root of the number of samples. Since this is a binomial process, either the outcome is $x_i$ or it isn't, the formula for the standard deviation of a binomial gives the formula for the standard error, which is,

\begin{displaymath}\mbox{SE} = \sqrt{\frac{\hat{p}(1-\hat{p})}{N}}. \end{displaymath}

Here $N$ is number of points in the sample. Thus, as the size of the sample increases, the uncertainty of the estimate decreases like one over the square root of the size of the sample. In the example above, our estimate for the probability that a flipped thumbtack lands pointy side up would be $0.65 \pm 0.11$.

Another approach is to use what are called confidence intervals. Given the estimated value of $\hat{p}$, a confidence interval is an range of values in which the true value of $p$ is likely to be. By ``likely'' one often means that the probability that the true value $p$ falls in the interval is 95%. This is called the 95% confidence interval. You might know that for a normal distribution it is expected that the data falls within one standard error 68% of the time, and within two standard errors about 95% of the time. One says that one has 95% confidence that the true value is between the estimate minus two standard errors and the estimate plus two standard errors. Now, the distribution of the measured estimate is not normal, it is binomial, but a normal distribution can be approximated by a normal distribution if the value of $\hat{p}$ is not too close to $0$ or $1$. Or one can use a binomial table. Figure [*] shows a graph of the 95% confidence intervals.

Figure: Confidence intervals for a binomial variable. $\hat{p}$ is the measured value; $p$ is the true value. The curves are labelled by the number of sample points.

In the example about, we would be 95% confident that the true value for the probability of a flipped thumbtack landing pointy side up is between $0.44$ and $0.86$. If a more accurate estimate is desired, a larger number of experiments is required. The size of the interval will decrease with the square root of the number of experiments.


next up previous
Next: Application to Validation and Up: Estimating Discrete Probabilities Previous: Estimating Discrete Probabilities
Jon Shapiro
1999-09-23