# Confidence intervals

Personal notes on confidence intervals.

Based on notes taken during a course [1].

## Confidence interval

Confidence interval for mean $$\mu$$ is $$(a,b)$$ with confidence $$p$$

means the same thing as

The procedure from which the interval $$(a,b)$$ was sampled returns an interval which contains mean $$\mu$$ with probability $$p$$

## For normal distributions

### With known variance

When $$X \sim N(\mu, \sigma^2)$$ and variance $$\sigma^2$$ is known, the $$1-\alpha$$ confidence interval for $$\mu$$ is

$\overline{X} \pm z_\frac{\alpha}{2} \frac{\sigma}{\sqrt{n}}$

### With unknown variance

For sample mean $$\overline{X}$$, sample variance $$S^2$$, and sample count $$n$$, $$\frac{\overline{X} - \mu}{\frac{S}{\sqrt{n}}}$$ has a $$t$$-distribution.

$\frac{\overline{X} - \mu}{\frac{S}{\sqrt{n}}} \sim t(n-1)$

The $$1-\alpha$$ confidence interval is

$\overline{X} \pm t_{\frac{\alpha}{2}, n-1} \frac{S}{\sqrt{n}}$

## For any distribution with large population

For any distribution, when sample count is large enough for the distribution of $$\overline{X}$$ to approximate a normal distribution as per the central limit theorem, the confidence interval for normal distributions is a reasonable approximation.

## Difference of means

### For normal distributions

#### With known variances

$\left( \overline{X_1} - \overline{X_2} \right) \pm z_\frac{\alpha}{2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$

#### With same unknown variance

$\left( \overline{X_1} - \overline{X_2} \right) \pm t_{\frac{\alpha}{2}, n_1 + n_2 - 2} \sqrt{S_p^2 \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}$

##### Pooled variance

$S_p^2 = \frac{(n_1 - 1) S_1^2 + (n_2 - 1) S_2^2}{n_1 + n_2 - 2}$

#### With unknown variances

Obtaining a confidence interval for the difference of means from normal distributions with separate unknown variances is known as the Behrens-Fisher problem. It can be approximated with Welchâ€™s approximation:

$T = \frac{(\overline{X_1} - \overline{X_2}) - (\mu_1 - \mu_2)} {\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}} \approx t(\nu)$

$\nu = \frac{\left( \frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} \right)^2} {\frac{\left( \frac{S_1^2}{n_1} \right)^2}{n_1 - 1} + \frac{\left( \frac{S_2^2}{n_2} \right)^2}{n_2 - 1} }$

## Proportion of variances

If $$X_1 \sim N(\mu_1,\sigma_1^2)$$ and $$X_2 \sim N(\mu_2,\sigma_2^2)$$, then the following has the $$F$$-distribution.

$\frac{\sigma_2^2}{\sigma_1^2} \cdot \frac{S_1^2}{S_2^2} \sim F(n_1-1, n_2-1)$

$$\sigma_1$$ and $$\sigma_2$$ are the true variances, while $$S_1$$ and $$S_2$$ are the sample variances.