Convex Functions / Jensen’s Inequality / Jensen’s Inequality on Expectations / Gibbs’ Inequality / Entropy
首先明确两点:
- Jensen’s Inequality is the property of convex functions. Convex function 本身就是通过 Jensen’s Inequality 定义的,它们基本就是一回事
- 在不同的领域,对 Jensen’s Inequality 做不同的展开,可以得到该特定领域的新的 inequality;所以说 Jensen’s Inequality 可以看做是一个总纲
- 基本套路是:该领域有一个 convex function,按 Jensen’s Inequality 展开,得到领域内概念 A 小于概念 B
1. Convex Function / Jensen’s InequalityPermalink
Let
- Definition of convex functions
is called convex if statisfies Jensen’s Inequality.
- Jensen’s Inequality
- Definition of concave functions
is said to be concave if is convex.
2. Jensen’s Inequality on ExpectationsPermalink
If
LHS is essentially
3. Gibbs’ InequalityPermalink
Let
Therefore:
如果我们用的是
这个式子我们称为 Gibbs’ Inequality.
我们接着变形:
is the entropy of distribution is the cross entropy of distributions and the information entropy of a distribution is less than or equal to its cross entropy with any other distribution
Interpretations of Permalink
- In the context of machine learning,
is often called the information gain achieved if is used instead of (This is why it’s also called the relative entropy of with respect to ). - In the context of coding theory,
can be construed as measuring the expected number of extra bits required to code samples from using a code optimized for rather than the code optimized for . - In the context of Bayesian inference,
is amount of information lost when is used to approximate . - 简单说,
可以衡量两个 distribution 和 的 “接近程度”- 如果
,那么 和 差异越大, 越大
- 如果
Comments