Before we had data, we had beliefs. After we got data, we had better beliefs. Bayesian statistics is the mathematics of that update.
This isn't just a philosophical stance — it's a practical framework that produces more interpretable results, handles small samples gracefully, and lets you stop an experiment as soon as you have enough evidence (without the p-hacking problem that plagues classical statistics).
Reinforce OS uses Bayesian analysis throughout. This chapter explains why.
The Core Idea: Beliefs as Probabilities
In classical (frequentist) statistics, parameters are fixed unknown constants. You either accept or reject hypotheses; there's no probability attached to a hypothesis being true.
Bayesian statistics takes a different view: parameters are uncertain, and uncertainty is measured with probability. Before seeing data, you have a prior distribution over the parameter. After seeing data, you update to a posterior distribution.
The update rule is Bayes' theorem:
In words:
The posterior is everything you know about the parameter after seeing the data. It's a full probability distribution, not a point estimate.
A Concrete Example: Coin Flipping
Suppose you're testing whether a new morning routine improves your mood. You rate each day as "good" or "not good." You've had 7 good days out of 10 trials.
Let = the true probability of a good day. You want to know: is ?
Prior: before your experiment, you believe is somewhere between 0.3 and 0.8, with no strong view. You encode this as — a uniform distribution.
Likelihood: with 7 successes out of 10 Bernoulli trials, the likelihood is proportional to .
Posterior: the Beta distribution has a magical property — a Beta prior combined with Bernoulli data gives a Beta posterior:
That's it. Your posterior belief about is : peaked around 0.67, with uncertainty quantified.
The Beta distribution is defined on [0, 1] — perfect for probabilities and rates.
- = prior successes + 1
- = prior failures + 1
- Mean =
- As and grow, the distribution narrows (more certainty)
Starting with is a "flat" prior — you have no strong belief. says "I expect roughly 50%, but with some uncertainty."
Prior Meets Data
A stronger prior moves less when the same new data walks in wearing muddy boots.
Priors and Posteriors
The prior encodes what you believe before seeing data. People sometimes worry that priors are "subjective" and therefore unscientific. A few responses to this:
- Everyone has priors — frequentist analysis implicitly uses flat priors (all parameter values equally likely), which is also a choice.
- Priors can be weakly informative — you don't need a strong prior. A prior that rules out physically impossible values (negative probabilities, effects larger than the universe) while being otherwise uncertain is perfectly defensible.
- Data dominates with enough samples — with 1,000 trials, a reasonable prior barely matters; the likelihood swamps it. Priors matter most when data is scarce, which is exactly when you need guardrails.
- Science accumulates — the posterior from one study becomes the prior for the next. This is how knowledge should build.
Prior sensitivity
A good practice: run your analysis with a few different priors (a flat prior, a weakly informative prior, a skeptical prior) and check whether conclusions change substantially. If they don't, your results are robust. If they do, you need more data before drawing strong conclusions.
Credible Intervals vs. Confidence Intervals
A 95% credible interval contains the true parameter with probability 95%, given your prior and data. You can say: "there is a 95% probability that the true effect is between 2 and 8 minutes."
A 95% confidence interval is different: if you repeated the experiment many times, 95% of the resulting intervals would contain the true parameter. For any single interval, the true parameter either is or isn't in it — there's no probability statement about the specific interval you computed.
The credible interval says what you want it to say. The confidence interval requires a mental contortion that most people skip, leading them to misinterpret confidence intervals as if they were credible intervals.
Reinforce OS reports credible intervals. When it says "HDI: [+2, +8] minutes," that means: given what you've observed, there's a 95% probability the true effect is in that range.
The Bayesian Approach to Stopping Rules
One of the most practically important advantages of Bayesian analysis: you can look at your data whenever you want.
In classical hypothesis testing, "optional stopping" — peeking at results and stopping when — inflates your false positive rate dramatically. The -value is only valid if the sample size was fixed in advance.
In Bayesian analysis, you can update the posterior after every new trial. Stopping when the posterior probability that exceeds 95% is completely valid. The posterior probability at any moment is an honest statement about your current state of knowledge.
This is why Reinforce OS can show you live updating results without compromising statistical validity. The analysis is always a correct summary of what the data says, not a test that depends on when you decided to look.
Bayesian optional stopping is valid because the posterior is always a correct summary of evidence. But "stop when you like what you see" is still a bias — you'd be sampling from the distribution of experiments where you got lucky. Good practice: decide your stopping criterion in advance (e.g., "I'll stop when the 95% HDI excludes zero") and stick to it.
Bayesian vs. Frequentist: A Practical Comparison
| Question | Frequentist answer | Bayesian answer |
|---|---|---|
| Is the effect real? | p = 0.03 (reject H₀) | P(effect > 0) = 97% |
| How big is the effect? | Point estimate ± SE | Posterior distribution |
| Can I stop early? | No (inflates Type I error) | Yes, with proper criterion |
| What does the interval mean? | 95% of intervals contain truth | 95% probability truth is here |
| Prior knowledge? | Ignored | Explicit prior |
| Small samples? | Unreliable | Regularized by prior |
Neither framework is always better. Frequentist methods are well-understood, easy to communicate, and required by many regulatory contexts. Bayesian methods are more natural for sequential decision-making, personalization, and communicating uncertainty to non-statisticians.
For personal experiments — where you're making decisions about your own behavior with limited data — Bayesian methods are almost always the right choice.
Hierarchical Models: Borrowing Strength
Here's a powerful extension: suppose you're running the same caffeine experiment across 50 users of Steady Practice. Each user has their own true effect . You could estimate each person separately, but with only 10 trials per person, the estimates are noisy.
A hierarchical model says: each person's effect is drawn from a population distribution . You estimate and jointly with all the individual effects.
This lets individual estimates borrow strength from the population. Someone with only 3 trials gets pulled toward the population mean — appropriately skeptical. Someone with 50 trials has their own estimate dominate.
Hierarchical models are the statistical engine behind Steady Practice's pooled analysis feature, which uses data from multiple users running similar experiments to improve everyone's estimates — especially early on, when individual data is sparse.
MCMC: When the Math Gets Hard
The Beta-Binomial conjugate model has a clean closed-form posterior. Most real models don't.
For complex models, we sample from the posterior using Markov Chain Monte Carlo (MCMC). The idea: construct a random walk through parameter space that spends more time in high-probability regions. After many steps, the samples approximate the posterior.
Modern MCMC (especially NUTS — No-U-Turn Sampler, used in Stan and PyMC) is remarkably efficient. Models that would have taken hours in 2000 now run in seconds.
You don't need to understand MCMC in detail to use Bayesian analysis — just know that when you see "posterior samples" or "credible intervals" in Reinforce OS, they came from a sampler running efficiently in the background.
Summary
- Bayesian statistics treats parameters as uncertain and updates beliefs using Bayes' theorem
- Priors encode what you knew before; posteriors encode what you know after
- Credible intervals have the natural interpretation that confidence intervals lack
- Bayesian optional stopping is valid — look at your data whenever you want
- Hierarchical models let you borrow strength across users or experiments
- Reinforce OS uses Bayesian analysis throughout: live updating, credible intervals, and pooled priors
Next: Multi-Armed Bandits — what if you want to learn and earn at the same time?