Entropy

April 10, 2018 2 minute read

首先明确两点：

Jensen’s Inequality is the property of convex functions. Convex function 本身就是通过 Jensen’s Inequality 定义的，它们基本就是一回事
在不同的领域，对 Jensen’s Inequality 做不同的展开，可以得到该特定领域的新的 inequality；所以说 Jensen’s Inequality 可以看做是一个总纲
- 基本套路是：该领域有一个 convex function，按 Jensen’s Inequality 展开，得到领域内概念 A 小于概念 B

1. Convex Function / Jensen’s InequalityPermalink

Let $X$ be a convex set in a real vector space and let $f : X \to R$ be a function.

Definition of convex functions
- $f$ is called convex if $f$ statisfies Jensen’s Inequality.
Jensen’s Inequality
- $\forall x_{1}, x_{2} \in X, \forall t \in [0, 1] : f (t x_{1} + (1 - t) x_{2}) \leq t f (x_{1}) + (1 - t) f (x_{2})$
Definition of concave functions
- $f$ is said to be concave if $- f$ is convex.

2. Jensen’s Inequality on ExpectationsPermalink

If $X$ is a random variable and $f$ is a convex function:

f (p_{1} x_{1} + p_{2} x_{2} + \dots + p_{n} x_{n}) \leq p_{1} f (x_{1}) + p_{2} f (x_{2}) + \dots + p_{n} f (x_{n})

LHS is essentially $f (E (X))$ and RHS $E (f (X))$ , which together give

f (E (X)) \leq E (f (X))

3. Gibbs’ InequalityPermalink

Let $p = {p_{1}, p_{2}, \dots, p_{n}}$ be the true probability distribution for $X$ and $q = {q_{1}, q_{2}, \dots, q_{n}}$ be another probability distribution (你可以认为一个假设的 $X$ distrbution). Construct a random variable $Y$ who follows $Y (x) = \frac{q (x)}{p (x)}$ . Given $f (y) = - \log (y)$ is a convex function, we have:

f (E (Y)) \leq E (f (Y))

Therefore:

\begin{aligned} - \log \sum_{i} (p_{i} \frac{q_{i}}{p_{i}}) & \leq \sum_{i} p_{i} (- \log \frac{q_{i}}{p_{i}}) \\ - \log 1 & \leq \sum_{i} p_{i} \log \frac{p_{i}}{q_{i}} \\ 0 & \leq \sum_{i} p_{i} \log \frac{p_{i}}{q_{i}} \end{aligned}

如果我们用的是 $\log_{2}$ 的话，可以称 RHS 为 Kullback–Leibler divergence or relative entropy of $p$ with respect to $q$ ：

D_{K L} (p ‖ q) \equiv \sum_{i} p_{i} \log_{2} \frac{p_{i}}{q_{i}} \geq 0

这个式子我们称为 Gibbs’ Inequality.

我们接着变形：

\begin{aligned} D_{K L} (p ‖ q) \equiv \sum_{i} p_{i} \log_{2} \frac{p_{i}}{q_{i}} & = \sum_{i} p_{i} \log_{2} p_{i} - \sum_{i} p_{i} \log_{2} q_{i} \\ = - H (p) + H (p, q) \geq 0 \end{aligned}

$H (p)$ is the entropy of distribution $p$
$H (p, q)$ is the cross entropy of distributions $p$ and $q$
$H (p) \leq H (p, q) \Rightarrow$ the information entropy of a distribution $p$ is less than or equal to its cross entropy with any other distribution $q$

Interpretations of $D_{K L} (p ‖ q)$ Permalink

In the context of machine learning, $D_{K L} (p ‖ q)$ is often called the information gain achieved if $q$ is used instead of $p$ (This is why it’s also called the relative entropy of $p$ with respect to $q$ ).
In the context of coding theory, $D_{K L} (p ‖ q)$ can be construed as measuring the expected number of extra bits required to code samples from $p$ using a code optimized for $q$ rather than the code optimized for $p$ .
In the context of Bayesian inference, $D_{K L} (p ‖ q)$ is amount of information lost when $q$ is used to approximate $p$ .
简单说， $D_{K L} (p ‖ q)$ 可以衡量两个 distribution $p$ 和 $q$ 的 “接近程度”
- 如果 $p = q$ ，那么 $D_{K L} (p ‖ q) = 0$
- $p$ 和 $q$ 差异越大， $D_{K L} (p ‖ q)$ 越大

Share on

X Facebook LinkedIn Bluesky

Convex Functions / Jensen’s Inequality / Jensen’s Inequality on Expectations / Gibbs’ Inequality / Entropy

1. Convex Function / Jensen’s InequalityPermalink

2. Jensen’s Inequality on ExpectationsPermalink

3. Gibbs’ InequalityPermalink

Interpretations of $D_{K L} (p ‖ q)$ Permalink

Share on

Comments

You May Also Enjoy

Python: Import Enum Uniformly before Comparison

Python Structural Pattern Matching (`match`/`case` since Python 3.10)

Ken Thompson’s NFA Simulation Algorithm

Config Mermaid.js in Minimal Mistakes theme

1. Convex Function / Jensen’s InequalityPermalink

2. Jensen’s Inequality on ExpectationsPermalink

3. Gibbs’ InequalityPermalink

Interpretations of DKL(p‖q)Permalink

Share on

Comments

You May Also Enjoy

Python: Import Enum Uniformly before Comparison

Python Structural Pattern Matching (match/case since Python 3.10)

Ken Thompson’s NFA Simulation Algorithm

Config Mermaid.js in Minimal Mistakes theme

Interpretations of $D_{K L} (p ‖ q)$ Permalink

Python Structural Pattern Matching (`match`/`case` since Python 3.10)