Frequentist Approach vs. Bayesian Approach

The Frequentist Approach

Observe data.
Assume the data were generated randomly, e.g. by nature, by designing a survey, etc.
Make assumptions on the generating process, e.g. i.i.d., Gaussian, etc.
Associate the generating process to some object of interest, e.g. a parameter, a density, etc.
Assuming that the object is unknown but fixed, try to find it, e.g. estimate it, test a hypothesis about it, etc.

The Bayesian Approach

Observe data.
Assume the data are generated randomly by some process.
Under some assumptions, e.g. parametric distribution, associate the process with some fixed object.
We have a prior belief about the object.
Using the data, we want to update that belief into a posterior belief.

See Galef’s “Rethinking Identity” for a fun exploration of Bayesian vs. Frequentists rivalry .

Example of the Bayesian Approach

Let $p$ be the proportion of women in the population.

Sample $n$ people randomly with replacement in the population. Denote their gender by $X_1, X_2, …, X_n$ (1 for woman, 0 otherwise).

A frequentist would have estimated $p$, probably using the MLE, constructed some confidence interval for $p$ and did some hypothesis testing.

Before analyzing the data, we may believe that $p$ is likely to be close to $ \frac{1}{2} $. We can quantify how close, e.g. $90\%$ sure that $ .4 \le p \le .6 $, $95\%$ sure that $ .3 \le p \le .8 $, etc.

Therefore, we can model our prior belief using a distribution for $p$ as if it were random (even though in reality, the true parameter is not random).

Suppose we model $ p \approx \mathrm{B}(a, a) $ for some $a > 0$. This distribution is called the prior distribution.

It’s common to use the beta distribution as a probability distribution of probabilities. This is especially helpful when we have prior assumptions. For example, if a player’s typical batting average is usually $.27$, but could reasonably range from $.21$ to $.35$, then $ \mathrm{B}(81, 219) $ is a good estimate because the mean of $ \mathrm{B}(\alpha, \beta)$ is $\frac{\alpha}{\alpha + \beta} = \frac{81}{81 + 219} = .27 $ and:

Notice how the distribution lies mostly within \$[.2, .35]\$. Credits: [David Robinson](#Robinson2013)

We assume that $X_1, …, X_n$ are i.i.d $Bernoulli(p)$ conditionally on $p$.

After observing the available sample $X_1, …, X_n$, we can update our belief about $p$ by taking its distribution conditionally on the data. The resulting distribution is called posterior distribution.

In our experiment, the posterior distribution is:

$$ \mathrm{B}\left(a + \sum_{i=1}^{n} X_i \ ,\ \ a + n - \sum_{i=1}^{n} X_i \right) $$

In general, given more evidence of the number of successes and the number of failures, the updated (posterior) beta distribution is $ \mathrm{B}(\alpha_{0} + successes \ , \ \ \beta_0 + misses) $

References

What is the intuition behind beta distribution? David Robinson. stats.stackexchange.com . Jan 15, 2013.