Chenzi Xu
MPhil DPhil (Oxon)
University of York
2022/12/04 (updated: 2022-12-12)
1 2 3 4 5
1 2 3 4 5
The sampling distribution of means is normal, provided that:
Illustration from Wikipedia
1 2 3 4 5
The Exponential distribution has a parameter \(\lambda\).
1 2 3 4 5
It represents the range of plausible value of the \(\mu\) parameter.
If you take samples repeatedly and compute the CI each time, 95% of those CIs will contain the true population mean \(\mu\).
1 2 3 4 5
The probability that the null hypothesis is true, or the probability that the alternative hypothesis is false.
The probability of obtaining the observed sample statistic, or some value more extreme than that, conditional on the assumption that the null hypothesis is true.
1 2 3 4 5
Null hypothesis significance testing (NHST) is only meaningful when power is high.
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
Data: the underlying random variable (Y) producing the data
Examples:
Probability distribution \(p(Y)\):
Examples:
Probability distribution \(p(Y)\):
1 2 3 4 5
Here, \(n\) represents the total number of trials, \(k\) the number of successes, and \(\theta\) the probability of success. The term \(\binom{n}{k}\), the number of ways in which one can choose \(k\) successes out of \(n\) trials, expands to \(\frac{n!}{k!(n-k)!}\).
1 2 3 4 5
Assumption: Each data point in the vector of data y is independent of the others.
The kernel density of the normal PDF: \(g(x|\mu,\sigma)=\exp \left(-\frac{(x-\mu)^2}{2\sigma^2} \right)\), and \(k\int{g(x)dx=1}\).
Probability in a continuous distribution is the area under the curve \(P(X<u)=\int_{-\infty}^uf(x)dx\), and will always be zero at any point value \(P(X=x)=0\).
1 2 3 4 5
1 2 3 4 5
Expectation \(E(x)\): the weighted mean of the possible outcomes, weighted by the respective probabilities of each outcome.
Variance is defined as: \(Var(X) = E[X^2]-E[Y]^2\)
Expectation: \(E\left[Y\right]=\sum_y y \cdot f(y)=n\theta\)
Variance: \(Var(X)=n\theta(1-\theta)\)
Expectation: \(E[X]=\int xf(x)dx=\mu\)
Variance: \(Var(X)=\sigma^2\)
1 2 3 4 5
The likelihood function \(\mathcal{L}(\theta \mid k,n)\) refers to the PMF \(p(k|n,\theta)\), treated as a function of \(\theta\).
Suppose that we record \(n = 10\) trials, and observe \(k = 7\) successes (heads in coin tosses).
\[\begin{equation} \mathcal{L}(\theta \mid k=7,n=10)= \binom{10}{7} \theta^{7} (1-\theta)^{10-7} \end{equation}\]1 2 3 4 5
The concept of“Integrating out a parameter”
Marginal likelihood: the likelihood computed by “marginalizing” out the parameter \(\theta\). It is a kind of weighted sum of the likelihood, weighted by the possible values of the parameter.
1 2 3 4 5
We have a joint PMF \(p_{X,Y} (x, y)\) for each possible pair of values of X and Y.
The marginal distributions: \[\begin{equation} p_{X}(x)=\sum_{y\in S_{Y}}p_{X,Y}(x,y) \end{equation}\]
\[\begin{equation} p_{Y}(y)=\sum_{x\in S_{X}}p_{X,Y}(x,y) \end{equation}\]The conditional distributions: \[\begin{equation} p_{X\mid Y}(x\mid y) = \frac{p_{X,Y}(x,y)}{p_Y(y)} \end{equation}\]
\[\begin{equation} p_{Y\mid X}(y\mid x) = \frac{p_{X,Y}(x,y)}{p_X(x)} \end{equation}\]1 2 3 4 5
PDF and CDF (\(\rho=0\)):