13 minute read

主要参考:

Prerequisite #1 : σ-algebraPermalink

非常蛋疼的一个事实:σ-algebra 并不是一个严格意义上的 algebra……

Definition: In mathematical analysis and in probability theory, a σ-algebra on a set S is a subset Σ2S that includes S itself. It is closed under complement and countable unions.

  • 因为 SΣ 同时它是 closed under complement,所以 Σ
  • σ-algebra, σ-ring 和 σ-field 都是有关系的,但这里不表

Prerequisite #2 : Borel Set / Borel σ-algebraPermalink

In mathematics, a Borel set is any set in a topological space that can be formed from open sets (or, equivalently, from closed sets) through the operations of countable union, countable intersection, and relative complement.

  • relative complement of A in B 就是 AB
  • relative complement of B in A 就是 BA

For a topological space X, the collection of all Borel sets on X forms a σ-algebra B, known as the Borel algebra or Borel σ-algebra. The Borel σ-algebra on X is the smallest σ-algebra containing all open sets (or, equivalently, all closed sets).

关于可数性:

  • A set S is said to be countable if it’s finite or card(S)=card(N)
  • card(R)>card(N) (Cantor Diagonal Argument)
  • If B is a Borel algebra in R, then card(B)=card(R)
    • 结论:B 不可数

Prerequisite #3 : Measurable Function / Measurable SpacePermalink

Definition: A measurable space is a tuple of (S,Σ) where S is a set and Σ is a σ-algebra over S.

  • measurable space 又称 Borel space

Definition: Let (X,ΣX) and (Y,ΣY) be measurable spaces. Function f:XY is called a measurable function if EYΣY,f1(EY)ΣX

  • f1 是 inverse function
  • 扩展一下 f1 的定义:f1(EY):={xX|f(x)EY}
  • 这个定义相当于:EYΣY,EXΣX 使得 f(EX)=EY
    • 这个 EXf1(EY)
  • 为了强调 f 是一个 measurable function,我们也可以把它写作 f:(X,ΣX)(Y,ΣY)

Prerequisite #4 : Measure / Measure SpacePermalink

Definition: Let (S,Σ) be a measurable space. Function μ:ΣR{,} is called a measure if it satisfies the following properties:

  1. Non-negativity: EΣ,μ(E)0
    • 注:不满足这个条件的 measure 是存在的,比如 signed measure
  2. Null empty set: μ()=0
  3. Countable additivity (or σ-additivity):  countable collection {Ei}i=1 where EiΣ,i and EiEj=,i,j:
μ(k=1Ek)=k=1μ(Ek)

Definition: A measure space is such a triple of (S,Σ,μ)

Prerequisite #5 : Probability Measure / Probability SpacePermalink

Definition: Measure μ is probability measure if μ(S)=1.

  • S 指全集

Definition: A probability space is a measure space with a probability measure, denoted by (Ω,F,P) where:

  • ωΩ is called an outcome
  • EF is called an event
  • P:F[0,1] is a probability measure
    • P(E) is the probability of E

Prerequisite #3/#4/#5 SummaryPermalink

  • measurable function f 定义在 measurable space (S,Σ)
  • measurable function f 有潜力构成一个 measure μ
  • measure μ + measurable space (S,Σ) = measure space (S,Σ,μ)
    • probability measure P 是特殊的 measure
    • 装备 probability measure 的 measure space 是 probability space (Ω,F,P)

我们可以把 measurable function f 进化 measure μ,但注意这里涉及一个定义域转化的问题:

  • f:SR
  • μ:ΣR
  • 比如我们可以定义 μ({x})=f(x) 然后根据 σ-additivity 有:
μ(E)=xEμ({x})=xEf(x)
  • 注意我这里的意思是:我们可以这样做,但没有规定说一定要这样做;μ 也不一定要通过 f 定义,f 也不一定满足进化成 μ 的要求

1. Random VariablePermalink

Definition: A random variable X is a measurable function X:(Ω,F)(R,B) such that Borel set BB

X1(B)=informal{XB}={ωΩ|X(ω)B}F
  • 准确来说应该是 R{,} 而不仅仅是 R
  • BR 上的 Borel σ-algebra

X(Ω,F,P) 上的 random variable,那么:

  • 我们称 XF-measurable. We define F(X) to be the smallest σ-algebra on Ω for which X is measurable.
  • 比较一下 XP:
    • 首先注意定义域:
      • X:ΩR (random variable 接收 outcome)
      • P:F[0,1] (probability measure 接收 event)
    • X 是 measurable function,P 是 probability measure,我们可以像上面 f 进化 μ 一样定义一个 X 使它可以 X 进化 P但是!没有必要。后面 distribution 的部分会阐述。

以投骰子为例 (一个骰子,仅投一次):

  • Ω={1,2,3,4,5,6}
  • F 包括但不限于 Ω{1}{2}{3}{4}{5}{6}
  • 假设有 P({1})=P({2})=P({3})=P({4})=P({5})=P({6})=16
    • 注意 event {1,3} 表示 “roll 出 1 或者 3”,而不是 “roll 两次,一次是 1 一次是 3”
      • “roll 两次,一次是 1 一次是 3” 的 event 应该是 {{1,3}}
    • 所以 P({1,3})=P({1})+P({3})=13,同理有 P(Ω)=1
    • “roll 出 1 且 3” 是不可能事件,即 ,由 measure 的定义得到 P()=0

2. Distribution of a Random VariablePermalink

感谢 PhoemueX: On clarifying the relationship between distribution functions in measure theory and probability theory

假定有 probability space (Ω,F,P),其上有一个 arbitrary 的 random variable X.

Definition: The push-forward measure of P by X is a function PX:BR such that BB,

PX(B)=(PX1)(B)=P(X1(B))=informalP({XB})=P({ωΩ|X(ω)B})
  • 注意根据 random variable 的定义,BB,X1(B)F,所以 X1(B)P 的定义域内
  • PX 一定是一个 probability measure,使得 (R,B,PX) 构成一个 probability space
  • X=I,即 X(ω)=ω,可得 PX=P

我们称 PX 为 distribution of random variable X.

我们这里重点考察一下 PXP 的关系,并引申出 X 在其中的作用:

  • P:F[0,1]
  • X:(Ω,F)(R,B)
  • 按理来说,X1 应该是 X1:RΩ,但是我们通过 X1(B) 的定义把它扩展成了 X1:BF
  • 于是 PX=PX1 就成了一个 BF[0,1] 的函数
  • 所以 X:FB 就可以看作一个 “event encoder“,它把每一个 event EF 映射到一个 Borel set BB
  • 同理X1:BF 就可以看成一个 “event decoder“,它把每一个 Borel set BB 又映射回原来的 event EF
  • Event encoding 的作用在于:可以把各种不同的、具体的 (Ω,F) 转化为统一的、抽象的 (R,B)
    • 比如 “投骰子” 和 “黑盒子里 6 个不同颜色的球,抓一个出来” 这两个实验,它们的 event 是不一样的,但我们明显可以看出它们的本质是一样的,这个本质体现在它们通过 X encoding 以后,得到的 Borel set 是一样的 (或者说得到的 PX 函数是一样的)
  • Event decoding 的作用在于计算,因为 PX 需要借助 P 才能算出具体的值
  • 我们平时根本就没有注意到这个 event encoding/decoding 的过程是因为:它太顺理成章了。比如上面 “投骰子” 的例子,我们直接就写出了 Ω={1,2,3,4,5,6},所以可以有 E=B,亦即 X=I,等于没有做 event encoding/decoding,于是我们也没有区分 PXP,因为 PX=P
  • 但是我也可以定义说 Ω={I,II,III,IV,V,VI},那你可能需要 encode 一下,得到:
    • X(I)=1
    • X(VI)=6
    • 所以 PX({3})=P(X1({3}))=P({III})
    • 当然,你的 X 的定义可以不用与 event 的语义对应,比如我定义 X(I)=100,,X(VI)=600,也是可以的

题外话:P(X=3) 这种写法如何解释?

  • 先说结论:这是个有点过分的简写
  • 首先 P(X=3) 应该是 P({X=3}) (P 接收 event)
  • 二来 X=3 应该理解为 X{3}
  • 这么一来,令 B={3},套公式可得:
P(X=3)=informalP({X=3})=informalP({X{3}})=PX({3})
  • 所以 X=3 整体是一个 event EF (informal);而 {3} 是一个 Borel set BB
  • X=I,则 E=B, PX=P,从而 P(X=3)=informalPX({3})=P({3})

PX 的性质还有:

  • If PX gives measure one to a countable set of reals, then X is called a discrete random variable.
    • PX:B[0,1], 然后 B 不可数
    • PX 的 domain 可能只是 B 的一个可数子集
  • If PX gives zero measure to every singleton set, and hence to every countable set, X is called a continuous random variable.
    • Every random variable can be written as a sum of a discrete random variable and a continuous random variable.
    • All random variables defined on a discrete probability space are discrete

Definition: 对任意的 (locally finite) measure μ on R,我们定义 distribution function of μ as

Fμ(x)=μ((,x])

那既然 PX 是一个 probability measure (probability measure 一定是 locally finite),我们可以定义:

FPX(a)=PX((,a])=P(X1((,a]))=informalP({X(,a]})=informalP(Xa)

严格来说,FPX 应该叫做 distribution function of the distribution of random variable X,但是非常不幸的是,它也被简称为 distribution of random variable X,并且简化符号为 FX=FPX

3. Probability Mass Functions (for the discrete), and Probability Density Functions (for the continuous)Permalink

Definition: Probability mass function for discrete random variable X, pX:R[0,1], can be defined as:

pX(x)=P(X=x)=PX({x})xRpX(x)=1

其实就是把 PX 的定义域中的 one-element BB 的部分降维到了 xR,就是这么简单。

Definition: Probability density function for continuous random variable X, fX:R[0,), is one satisfying:

abfX(x)dx=P({aXb})=FX(b)FX(a)fX(x)dx=1
  • 严格来说,fX 应该叫做 “the density or Radon–Nikodym derivative with respect to Lebesgue measure of random variable X

fX 存在:

  • 我们可以写 FX(x)=xfX(t)dt
  • If fX is continuous at tfX(x)=FX(x)

4. Tilde / i.i.d.Permalink

根据 Ben O’Neill: Why are probability distributions denoted with a tilde? 其实是一个 equivalence relation,所以 XY 左右两边都是 random variable,它可以念做 “X has the same distribution as Y”。我觉得这基本就是 X=Y 的意思了。

  • 所以 N(0,1) 它不是 distribution,而是一个 random variable
  • 如果 XN(0,1),那么 X(x)=N(x;0,1)
  • 如果 μ,σ2 不确定,N(μ,σ2) 可以看做一个 parametric random variable
    • 注意如果有 XN(μ,σ2),那么这里 N(μ,σ2) 一定是表示一个具体的 random variable (once μ,σ2 确定下来),而不能理解为是一个 family of random variables

那么问题来了:”has the same distribution as” 这个 distribution 指的是 PX 还是 FX=FPX

  • XY 都是 discrete random variable,那么明显 PX 更直接,所以一般我们用 PX=PY 这个结论
    • 进而有 pX=pY
  • XY 都是 continuous random variable,那么明显 FX 才有意义,所以一般我们用 FX=FY 这个结论
    • 进而有 fX=fY

我们直接研究 random variable X 即意味着我们跳过了 event encoding/decoding 的步骤,直接在 (R,B,PX) 这个抽象的 probability space 上工作,至于原来的 (Ω,F,P) 长什么样子我们就不关心了。

另外还有一个常见的概念是 i.i.d. (independent and identically distributed),它是用来形容一组 random variables 的。简单说,如果 X1,,Xn 是一组 i.i.d. 的 random variables,那么:

  • X1X2Xn1Xn (我觉得诡异的是这么多年我就没见过哪本教材用这个式子来描述 i.i.d.)
  • X1,,Xn 互相是 independent 的

5. Independence / Marginal Distribution / Join DistributionPermalink

我们先从 (Ω,F,P) 的层次入手。

Definition: (1) Two events E1,E2 are called independent if

P(E1E2)=P(E1)P(E2)

(2) A collection of events {Ei} is called independent if distinct E1,,En,

P(E1En)=P(E1)P(En)

(3) A collection of events {Ei} is called pairwise independent if distinct Ei,Ej,

P(EiEj)=P(Ei)P(Ej)

(4) A finite collection of σ-algebras F1,,Fn is called independent if E1F1,,EnFn, {E1,,En} is independent.

(5) An infinite collection of σ-algebras is called independent if every subcollection is independent.

If X1,,Xn are random variables, we can consider them as a random vector (X1,,Xn) and hence as ONE random variable X1:n:B(Rn)Rn

  • Let T(Rn)P(Rn) denote the standard topology on Rn consisting of all open sets
    • P(S)=2S
  • B(Rn) is the σ-algebra generated by all the open set, i.e. B(Rn)=σ(T(Rn))

假设原有 probability spaces (Ω1,F1,P1),,(Ωn,Fn,Pn),令 Ω1:n=Ω1××Ωn, F1:n=F1Fn, P1:n=P1××Pn. 注意,根据 Wikipedia: Product measure

  • Ω1×Ω2 is the Cartesian product of the two sets
  • F1F2 is the σ-algebra on Ω1×Ω2, generated by subsets of the form E1×E2 where E1F1 and E2F2
  • A product measure P1×P2 is defined to be a measure on the measurable space (Ω1×Ω2,F1F2) satisfying E1F1,E2F2,
(P1×P2)(E1×E2)=P1(E1)P2(E2)

假设 X1(Ω1,F1,P1) 上的 random variable,Xn(Ωn,Fn,Pn) 上的 random variable。然后我们有一批:

  • distribution: PX1,,PXn
  • distribution function of distribution: FX1,,FXn
  • PMF: pX1,,pXn
  • PDF: fX1,,fXn

Definition: For random variable X1:n=(X1,,Xn), its joint distribution PX1:n:B(Rn)R can be defined as: B1:n=B1××Bn, B1:nB(Rn)

PX1:n(B1:n)=P1:n(X1:n1(B1:n))=informalP1:n({X1:nB1:n})=informalP1:n({X1B1,,XnBn})

Definition: For joint distribution PX1:n, its joint distrbution function FX1:n=abbrev.FPX1:n:Rn[0,1] can be defined as: tiR

FX1:n(t1,,tn)=informalP1:n({X1t1,,Xntn})=PX1:n((,t1]××(,tn])

Definition: For random variable X1:n=(X1,,Xn), its joint probability mass function pX1:n:Rn[0,1] can be defined as:

pX1:n(x1,,xn)=PX1:n({{x1,,xn}})=P1:n({x1,,xn})xRnpX1:n(x)=1

Definition: For random variable X1:n=(X1,,Xn), its joint probability density function fX1:n:Rn[0,] is one statisfying: B1:n=B1××Bn,B1:nB(Rn),

B1:nfX1:n(x1,,xn)dx1dxn=PX1:n(B1:n)B(Rn)fX1:n(x1,,xn)dx1dxn=1

Definition: Random variables X1,,Xn are said to be independent if any of these (equivalent) conditions hold:

(1) Joint distribution is the product of all marginal distributions:

PX1:n(B1:n)=informalP1:n({X1B1,,XnBn})=P1({X1B1})Pn({XnBn})=PX1(B1)PXn(Bn)
  • This is equivalent of saying “joint distribution is the product measure of all marginal distributions”:
PX1:n=PX1××PXn
  • Marginal distribution of X 其实就是 X’s individual distribution,它只在 joint distribution 这个 context 下有意义。语出二维的 discrete joint distribution table,比如:

(2) Joint distribution function is the product of all marginal distribution functions:

FX1:n(t1,,tn)=PX1:n((,t1]××(,tn])=PX1((,t1])PXn((,tn])=FX1(t1)FXn(tn)

(3) Joint PMF is the product of all individual PMFs:

pX1:n(x1,,xn)=PX1:n({{x1,,xn}})=PX1({x1})PXn({xn})=pX1(x1)pXn(xn)

(4) Joint PDF (if exists) is the product of all individual PDFs:

fX1:n(x1,,xn)=fX1(x1)fXn(xn)

(5) The σ-algebras F(X1),F(Xn) are independent.

6. Conditional Random VariablePermalink

Suppse X,Y are discrete random variables over (Ω,F,P). If P(Y=y)0, then we can define the conditional probability (measure):

P(X=xY=y)=P(X=x and Y=y)P(Y=y)

Definition: The discrete conditional random variable XY=y, read “X given Y=y”, has PMF

pXY=y(x)=P((XY=y)=x)=P(X=xY=y)

Similarly, we can have

Definition: The continuous conditional random variable XY=y, has PDF

fXY=y(x)=fX,Y(x,y)fY(y)

Categories:

Updated:

Comments