3 minute read

总结自 Coursera lecture Statistical Inference section 07 Asymptotics。新的 slide 省略了部分推导过程,最好同时参考下旧的 slide。


0. AsymptoticsPermalink

Asymptotics,[æsɪmp’tɒtɪks] 渐近性,其实就是讲 number of trials+ 时的一些性质。

1. The Law of Large NumbersPermalink

1.1 DefinitionPermalink

There are many variations on the LLN; we are using a particularly lazy version here.

The law of large numbers states that if X1,Xn are iid from a population with mean μ and variance σ2, then X, the sample average of the n observations, converges in probability to μ, i.e.

X=1n(X1++Xn)Xμ when n

Or more generally, the average of the results obtained from a large number of trials (i.e. n, since we get an observation per trial) should be close to the expected value, and will tend to become closer as more trials are performed.

1.2 SimulationPermalink

n <- 10000; 
means <- cumsum(rnorm(n)) / (1:n) ## cumsum 累积求和,e.g. cumsum(c(1,2,3)) = c(1,3,6)
plot(1:n, means, type = "l", lwd = 2, frame = FALSE, ylab = "cumulative means", xlab = "sample size")
abline(h = 0)

1.3 Consistency and Bias of an estimatorPermalink

  • An estimator is consistent if it converges to what you want to estimate, i.e. X^X
    • Consistency is neither necessary nor sufficient for one estimator to be better than another
    • The LLN basically states that the sample mean is consistent
    • The sample variance and the sample standard deviation are consistent as well
  • An estimator is unbiased if the expected value of an estimator is what its trying to estimate, i.e. E[X^]=X

2. The Central Limit TheoremPermalink

2.1 DefinitionPermalink

CLT says

XN(μ,σ2n) when n

In another word

Xμσ/n=EstimateMean of estimateStd. Err. of estimateN(0,1) when n

2.2 Confidence intervalsPermalink

置信区间只在频率统计中使用。在贝叶斯统计中的对应概念是可信区间。

举例来说,如果在一次大选中某人的支持率为 55%,而置信水平 0.95 上的置信区间是(50%, 60%),那么他的真实支持率有 95% 的机率落在 50% 和 60% 之间,因此他的真实支持率不足一半的可能性小于 2.5%(假设分布是对称的)。

[X2σn,X+2σn] is called a 95% interval for μ.

更多内容可以参考 Stat Trek: What is a Confidence Interval?

2.3 Apply CLT to Bernoulli estimatorsPermalink

σ2=p(1p)2σn=2p(1p)np(1p)14,for0p12σn=2p(1p)n214n=1n

X±1n is a quick CI estimate for p (since μ=p in Bernoulli)

Exercise IPermalink

What is the probability of getting 45 or fewer heads out 100 flips of a fair coin? (Use the CLT, not the exact binomial calculation)

  • μ=p=0.5
  • σ2=p(1p)=0.25,σ100=0.05
  • X=45100=0.45
pnorm(0.45, mean=0.5, sd=0.05)
## [1] 0.1586553

Exercise IIPermalink

Your campaign advisor told you that in a random sample of 100 likely voters, 56 intent to vote for you. Can you relax? Do you have this race in the bag?

  • X=56100=0.56
  • 1100=0.1
  • an approximate 95% interval of p is [0.46, 0.66]
  • Not enough for you to relax, better go do more campaigning!

2.4 Calculate Poisson interval with RPermalink

A nuclear pump failed 5 times out of 94.32 days, give a 95% confidence interval for the failure rate per day (i.e. λ)?

poisson.test(x, T = 94.32)$conf
## [1] 0.01721 0.12371
## attr(,"conf.level")
## [1] 0.95

Comments