Terminology Recap: Generative Models / Discriminative Models / Frequentist Machine Learning / Bayesian Machine Learning / Supervised Learning / Unsupervised Learning / Linear Regression / Naive Bayes Classifier
1. Generative vs DiscriminativePermalink
- Generative Model
tries to learn- 这两条路线都可以走:
- Explicitly models the distribution of both the features and the corresponding labels (classes)
- Aims to explain the generation of all data
- Example techniques:
- Naive Bayes Classifier
- Hidden Markov Models (HMM)
- Gaussian Mixture Models (GMM)
- Multinomial Mixture Models
- 这两条路线都可以走:
- Discriminative Model
tries to learn- Aims to predict relevant data
- Example techniques:
nearest neighbors- logistic regression
- linear regression
- 没错,linear regression 其实是 discriminative model
- 我觉得这就是 discriminative 这个名字不好的地方
- Conditional Random Fields (CRFs)
- Logistic Regression is the simplest CRF
- SVMs
- perceptrons
2. Frequentist vs BayesianPermalink
Section 5.6 Bayesian Statistics, Deep Learning 上说:
As discussed in section 5.4.1, the frequentist perspective is that the true parameter value
is fixed but unknown, while the point estimate is a random variable on account of it being a function of the dataset (which is seen as random).
The Bayesian perspective on statistics is quite different. The Bayesian uses probability to reflect degrees of certainty of states of knowledge.The dataset is directly observed and so is not random. On the other hand, the true parameteris unknown or uncertain and thus is represented as a random variable.
但是,从我找到的其他材料,以及 Deep Learning 后面自己的 Example: Bayesian Linear Regression 小节来看,我并没有看出 Bayesian machine learning 在 model 的时候有把 (training) dataset 看做 observed。所以我觉得 Frequentist vs Bayesian machine learning 最大的一点区别就在于:
- Frequentist
the true, unknown parameter is a value- 所以 Frequentist machine learning
is not modeled probabilistically
- 所以 Frequentist machine learning
- Bayesian
the true, unknown parameter is a random variable- 所以 Bayesian machine learning
is modeled probabilistically
- 这个
可以是 latent variable
- 所以 Bayesian machine learning
具体处理起来的话,一般的做法是:
- Frequentist
- 写出
的表达式 - 做 point estimate
使得 - 对 test data 做 prediction:
- 写出
- Bayesian
- 变形
- 或者用
做 MAP - 注意:做 MAP 会让人觉得这很像是 Frequentist,但注意 Bayesian 的主要特征其实是变形
- 或者用
- 对 test data 做 prediction:
- 可以得到
的 distribution
- 可以得到
- 变形
注意这里的
- 这个
,它既可以表示 ,也可以表示 ,完全看你自己的需求 - 也就是说:无论是 Frequentist 还是 Bayesian machine learning,
的形式确定了你到底是 Generative 还是 Discriminative model
3. Generative vs Discriminative, Frequentist vs BayesianPermalink
所以这两种划分是不冲突的,我们完全可以做一个
Frequentist | Bayesian | |
---|---|---|
Discriminative | ||
Generative |
- Items to the right of the semicolon (;) are not modeled probabilistically
- 注意符号:
表示 “distribution of , conditioned on and ” 表示 “joint distribution of and ”
4. Unsupervised vs SupervisedPermalink
明显可以看出,无论是 generative 还是 discriminative,它们都是 supervised learning 的范畴,因为它们都有
那么 unsupervised learning 我们可以简单理解为去 learn
首先,density estimation
- clustering 可以看做是 learn
- PCA 可以看做是 learn
- embedding 可以看做是 learn
回到 Frequentist vs Bayesian 的讨论。那我们其实也可以让
5. Frequentist Discriminative Example: Linear RegressionPermalink
- MLE 等价于 minimizing KL divergence
- MLE 等价于 minimizing cross-entropy
- When
is Gaussian,等价于 minimizing - 亦即
is the cross-entropy between the empirical distribution and a Gaussian model.
- When
linear regression 并没有说需要 assumption on Gaussian distributions,但是你会注意到我们 linear regression 一般是 minimizing
We can imagine that with an infinitely large training set, we might see several training examples with the same input value
根据 linear regression 的 assumption,
进而我们可以得出
- 你也可以把所有的
集合起来,这时 是一个多元的 Gaussian:
- 参考资料 Maximum Likelihood Estimation For Regression
- 这个例子是否能说明:只要出现了 minimizing
的算法都能找到对应的 Gaussian model 的解释?
最后说一下 prediction:
- 对一个新来的 test data point
,prediction 很简单: 。但这个式子是怎么得来的呢? - 因为我们的 assumption 是
,所以我觉得应该把 prediction 理解成
6. Bayesian Generative Example: Naive Bayes ClassifierPermalink
参考:
- Michael Collins: The Naive Bayes Model, Maximum-Likelihood Estimation, and the EM Algorithm
- Yao’s Blog: Naive Bayes classifier
简单说就是
这个时候你把
接着用 MAP 就可以了。
- 注意 Michael Collins: The Naive Bayes Model, Maximum-Likelihood Estimation, and the EM Algorithm 中经过一番变形后使用了 MLE,我觉得没有必要这么绕
至于
Comments