Bernoulli vs Multinoulli (Categorical) vs Binomial vs Multinomial / Gaussian / Poisson Distributions
参考:
- Section 06 - Common distributions, Statistical Inference@Coursera
- The Poisson Distribution
- 伯努利分布、二项分布、多项分布、Beta分布、Dirichlet分布
- The Multinomial Model
0. The “Choose” notation
- $n!$ reads “n factorial”
- ${n \choose x} = \frac{n!}{x!(n-x)!}$ reads “n choose x”
$n \choose x$ counts the number of ways of selecting $x$ objects out of $n$ without replacement disregarding the order of the items, i.e. $C_n^x$.
Specially,
\[{n \choose 0} = {n \choose n} = 1\]其实可以扩展一下:假设有 $n$ 个 objects,分成 $c$ 类,我们抽取:
- $n_1$ 个 ojects of type $1$
- $n_2$ 个 ojects of type $2$
- $\dots$
- $n_c$ 个 ojects of type $c$
- $\sum_{i=1}^{c} n_i = n$
那么抽取的方式可以有 ${n \choose {n_1, \dots, n_c}} = \frac{n!}{n_{1}! \cdots n_{c}!}$ 种
1. Bernoulli vs Multinoulli (Categorical) vs Binomial vs Multinomial
简单说就是:
- $\operatorname{Bernoulli}(\pi_1)$ (伯努利分布):抛硬币 1次
- $\operatorname{Multinoulli}(\boldsymbol{\pi})$ (多努利分布): 投骰子 $1$次; a.k.a Categorical (范畴分布)
- $\operatorname{Binomial}(n, \pi_1)$ (二项分布):抛硬币 $n$次
- $\operatorname{Multinomial}(n, \boldsymbol{\pi})$ (多项分布):投骰子 $n$次
注意:
- 严格来说,这应该是 4 个 RVs,而不是 4 个 distributions,但是这方面的混乱不是一天两天了
- 更进一步来说,这是 4 个 discrete RVs
- 单独一个 $\pi_1$ 表示 “每一次 toss,得到 binary outcome $1$ 的概率”
- 因为是 binary,所以 $\pi_0 = 1 - \pi_1$ 就省略了
- $\boldsymbol{\pi}$ 其实是一个 distribution:
- $\boldsymbol{\pi} = \lbrace \pi_1, \dots, \pi_c \rbrace$ ($c$ 应该是一个 countable)
- $\pi_i$ 表示 “每一次 toss,得到 categorical outcome $i$ 的概率”
- $\sum_{i} \pi_i = 1$
这 4 者的关系是:
- $\operatorname{Multinoulli}(\lbrace \pi_1, 1-\pi_1 \rbrace) \sim \operatorname{Bernoulli}(\pi_1)$
- $\operatorname{Binomial}(1, \pi_1) \sim \operatorname{Bernoulli}(\pi_1)$
- $\operatorname{Multinomial}(1, \boldsymbol{\pi}) \sim \operatorname{Multinoulli}(\boldsymbol{\pi})$
If $X \sim \operatorname{Bernoulli}(\pi_1)$:
- $X$ is binary
- $\mathbb{P}(X=1) = \pi_1$
- $\mathbb{P}(X=0) = 1-\pi_1$
- PMF $p_X(x) = \pi_1^x (1-\pi_1)^{1-x}$
- $\mathbb{E}[X] = \pi_1$
- $\operatorname{Var}(X) = \pi_1(1-\pi_1)$
If $X \sim \operatorname{Multinoulli}(\boldsymbol{\pi})$:
- $X$ is categorical
- $\mathbb{P}(X=1) = \pi_1$
- $\cdots$
- $\mathbb{P}(X=c) = \pi_c$
- PMF $p_X(x) = \prod_{i=1}^c \pi_i^{I(x=i)}$
If $\mathbf{X} \sim \operatorname{Binomial}(n, \pi_1)$:
- $X = (n_1, n_0)$
- $n_1$ 表示 “出现 outcome $1$ 的个数”
- $n_0$ 表示 “出现 outcome $0$ 的个数”
- $n_1 + n_0 = n$ (因为 toss 了 $n$ 次)
- $\mathbb{P}_X(n_1, n_0) = {n \choose {n_1, n_0}} \pi_1^{n_1} \, (1-\pi_1)^{n_0}$
- 但有时候为了省事,我们又可以令 $X = n_1 \times 1 + n_0 \times 0 = n_1$,所以有:
- PMF $p_{X}(x) = {n \choose x} \pi_1^x(1-\pi_1)^{n-x}$, where $x = 0,\ldots,n$
- 我不喜欢这种省事
If $\mathbf{X} \sim \operatorname{Multinomial}(n, \boldsymbol{\pi})$:
- $X = (n_1, n_2, \dots, n_c)$
- $n_1$ 表示 “出现 outcome $1$ 的个数”
- $\cdots$
- $n_c$ 表示 “出现 outcome $c$ 的个数”
- $\sum_{i=1}^c n_i = n$ (因为 toss 了 $n$ 次)
- $\mathbb{P}_X(n_1, \dots, n_c) = {n \choose {n_1, \dots, n_c}} \pi_1^{n_1} \dots \pi_c^{n_c}$
注意 Multinomial 和 Multivariate 的区别:
- Multivariate 一般指 compound RV,比如 $X = X_1,X_2$,然后 $X_1$ 和 $X_2$ 各有一个 distribution,合起来 $X$ 有一个 multivariate distribution
- Multinomial 有很强的 categorical/count 的性质
Exercise:
- Suppose a friend has 8 children, 7 of which are girls and none are twins
- If each gender has an independent 50% probability for each birth, what’s the probability of getting 7 or more girls out of 8 births?
choose(8, 7) * .5 ^ 8 + choose(8, 8) * .5 ^ 8
## [1] 0.03516
pbinom(6, size = 8, prob = .5, lower.tail = FALSE) ## if lower.tail=TRUE (default), return P(X ≤ x), otherwise, return P(X > x). 所以这里是 return P(X > 6)
## [1] 0.03516
2. Normal (Gaussian) Distribution
2.1 Definition
If $X \sim \mathcal{N}(\mu, \sigma^2)$, we call RV $X$ following a normal or Gaussian distribution with mean $\mu$ and variance $\sigma^2$:
- PMF $f(x) = \frac{1}{\sqrt{2 \pi \sigma^2} } e^{ - \frac{(x-\mu)^2}{2 \sigma^2}}$
- $E[X] = \mu$ and $Var(X) = \sigma^2$
The distribution of $\mathcal{N}(0, 1)$ is called the standard normal distribution:
- PMF $\phi(x) = \frac{1}{\sqrt{2 \pi} } e^{ - \frac{x^2}{2} }$
Standard normal RVs are often labeled $Z$:
- If $X \sim \mbox{N}(\mu,\sigma^2)$, then $Z = \frac{X - \mu}{\sigma} \sim \mbox{N}(0,1)$ i.e. $Z$ is standard normal
- If $Z$ is standard normal, then $X = \mu + \sigma Z \sim \mbox{N}(\mu, \sigma^2)$
- The non-standard normal density is $\frac{\phi(\frac{x - \mu}{\sigma})}{\sigma}$
Percentiles:
- Approximately 68%, 95% and 99% of the normal density lies within 1, 2 and 3 standard deviations from the mean, respectively
- -1.28, -1.645, -1.96 and -2.33 are the $10^{\text{th}}$, $5^{\text{th}}$, $2.5^{\text{th}}$ and $10^{\text{st}}$ percentiles of the standard normal distribution respectively
- By symmetry, 1.28, 1.645, 1.96 and 2.33 are the $90^{\text{th}}$, $95^{\text{th}}$, $97.5^{\text{th}}$ and $99^{\text{th}}$ percentiles of the standard normal distribution respectively
Other properties:
- The normal distribution is symmetric and peaked about its mean, therefore the mean, median and mode are all equal
- A constant times a normally distributed random variable is also normally distributed
- Sums of normally distributed random variables are again normally distributed even if the variables are dependent
- Sample means of normally distributed random variables are again normally distributed
- The square of a standard normal random variable follows what is called the chi-squared distribution
- The exponent of a normally distributed random variables follows what is called the log-normal distribution
2.2 Exercise
2.2.1 What is the $95^{\text{th}}$ percentile of a $N(\mu, \sigma^2)$ distribution?
- Quick answer in R
qnorm(.95, mean = mu, sd = sd)
- We want the point $x_0$ so that $ P(X \leq x_0) = .95 $
- Therefore $\frac{x_0 - \mu}{\sigma} = 1.645$ or $x_0 = \mu + 1.645\sigma$
- In general $x_0 = \mu + z_0 \sigma$ where $z_0$ is the appropriate standard normal quantile
2.2.2 What is the probability that a $\mbox{N}(\mu,\sigma^2)$ RV is 2 standard deviations above the mean?
I.e. we want to know
\[\begin{eqnarray*} P(X > \mu + 2\sigma) & = P \left ( \frac{X -\mu}{\sigma} > \frac{\mu + 2\sigma - \mu}{\sigma} \right ) \newline & = P(Z \geq 2 ) \newline & \approx 2.5\% \end{eqnarray*}\]2.2.3 Clicks Problem I
Assume that the number of daily ad clicks for a company is approximately normal distributed with a mean of 1020 and a stadard deviation of 50. What is the probablity of getting more than 1160 clicks in a day?
- First thought: it is not very likely, 1160 is 2.8 standard deviations from the mean
pnorm(1160, mean = 1020, sd = 50, lower.tail = FALSE)
## [1] 0.002555
pnorm(2.8, lower.tail = FALSE)
## [1] 0.002555
2.2.4 Clicks Problem II
What number of daily ad clicks would represent the one where 75% of days have fewer clicks?
qnorm(0.75, mean = 1020, sd = 50)
## [1] 1054
3. Poisson distribution
3.1 Definition
- The Poisson mass function is \(P(X = x; \lambda) = \frac{\lambda^x e^{-\lambda}}{x!}, \text{ for } x=0,1,\ldots\)
- The mean of this distribution is $\mu = \lambda$
- The variance of this distribution is $\sigma^2 = \lambda$
- Notice that $x$ ranges $[0,\infty]$
3.2 Some uses for the Poisson distribution
The Poisson distribution applies when:
- the event is something that can be counted in whole numbers;
- occurrences are independent, so that one occurrence neither diminishes nor increases the chance of another;
- the average frequency of occurrence for the time period in question is known;
- and it is possible to count how many events have occurred,
such as the number of times a firefly lights up in my garden in a given 5 seconds, some evening, but meaningless to ask how many such events have not occurred.
When $n$ is large and $p$ is small:
- Poisson distribution can be used to approximate binomials
3.3 Rates and Poisson random variables
- Poisson random variables are used to model rates
- If $X \sim Poisson(\lambda)$ on 1 unit interval, then $Y \sim Poisson(k\lambda)$ on $k$ unit intervals.
- $\lambda = E[\frac{Y}{k}]$ is the expected count per time unit (i.e. rate)
- $k$ means the total monitoring process takes $k$ time units
3.4 Exercise: Rate
The number of people that show up at a bus stop is Poisson with a mean of 2.5 per hour. If watching the bus stop for 4 hours, what is the probability that 3 or fewer people show up for the whole time?
ppois(3, lambda = 2.5 * 4)
## [1] 0.01034
3.5 Poisson approximation to the binomial
- When $n$ is large and $p$ is small, the Poisson distribution is an accurate approximation to the binomial distribution
- Notation
-
$X \sim \mbox{Binomial}(n, p)$
- $\lambda = n p$ and
- $n$ gets large
- $p$ gets small
- $\lambda$ stays constant
-
3.6 Exercise: Poisson approximation to the binomial
We flip a coin with success probablity 0.01 five hundred times. What’s the probability of 2 or fewer successes?
pbinom(2, size = 500, prob = .01)
## [1] 0.1234
ppois(2, lambda=500 * .01)
## [1] 0.1247
Comments