Personal notes on confidence intervals.
Based on notes taken during a course [1].
Confidence interval
Confidence interval for mean \(\mu\) is \((a,b)\) with confidence \(p\)
means the same thing as
The procedure from which the interval \((a,b)\) was sampled returns an interval which contains mean \(\mu\) with probability \(p\)
For normal distributions
With known variance
When \(X \sim N(\mu, \sigma^2)\) and variance \(\sigma^2\) is known, the \(1-\alpha\) confidence interval for \(\mu\) is
\[ \overline{X} \pm z_\frac{\alpha}{2} \frac{\sigma}{\sqrt{n}} \]
With unknown variance
For sample mean \(\overline{X}\), sample variance \(S^2\), and sample count \(n\), \(\frac{\overline{X} - \mu}{\frac{S}{\sqrt{n}}}\) has a \(t\)-distribution.
\[ \frac{\overline{X} - \mu}{\frac{S}{\sqrt{n}}} \sim t(n-1) \]
The \(1-\alpha\) confidence interval is
\[ \overline{X} \pm t_{\frac{\alpha}{2}, n-1} \frac{S}{\sqrt{n}} \]
For any distribution with large population
For any distribution, when sample count is large enough for the distribution of \(\overline{X}\) to approximate a normal distribution as per the central limit theorem, the confidence interval for normal distributions is a reasonable approximation.
Difference of means
For normal distributions
With known variances
\[ \left( \overline{X_1} - \overline{X_2} \right) \pm z_\frac{\alpha}{2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \]
With same unknown variance
\[ \left( \overline{X_1} - \overline{X_2} \right) \pm t_{\frac{\alpha}{2}, n_1 + n_2 - 2} \sqrt{S_p^2 \left( \frac{1}{n_1} + \frac{1}{n_2} \right)} \]
Pooled variance
\[ S_p^2 = \frac{(n_1 - 1) S_1^2 + (n_2 - 1) S_2^2}{n_1 + n_2 - 2} \]
With unknown variances
Obtaining a confidence interval for the difference of means from normal distributions with separate unknown variances is known as the Behrens-Fisher problem. It can be approximated with Welch’s approximation:
\[ T = \frac{(\overline{X_1} - \overline{X_2}) - (\mu_1 - \mu_2)} {\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}} \approx t(\nu) \]
\[ \nu = \frac{\left( \frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} \right)^2} {\frac{\left( \frac{S_1^2}{n_1} \right)^2}{n_1 - 1} + \frac{\left( \frac{S_2^2}{n_2} \right)^2}{n_2 - 1} } \]
Proportion of variances
If \(X_1 \sim N(\mu_1,\sigma_1^2)\) and \(X_2 \sim N(\mu_2,\sigma_2^2)\), then the following has the \(F\)-distribution.
\[ \frac{\sigma_2^2}{\sigma_1^2} \cdot \frac{S_1^2}{S_2^2} \sim F(n_1-1, n_2-1) \]
\(\sigma_1\) and \(\sigma_2\) are the true variances, while \(S_1\) and \(S_2\) are the sample variances.