# Introduction to Linear Discriminant Analysis

参考自 Multi-label Linear Discriminant Analysis

Linear discriminant analysis (LDA) is a well-known method for dimensionality reduction.

Given a data set with $ n $ samples $ \lbrace x^{(i)}, y^{(i)} \rbrace^n_{i=1} $ and $ K $ classes, where $ x^{(i)} \in \mathbb{R}^p $ and $ y^{(i)} \in \lbrace0, 1\rbrace^K $ ($ K $ 维的 0-1 vector). $ y^{(i)}_k = 1 $ if $ x^{(i)} $ belongs to the $k$^th class, and 0 otherwise.

Let input data be partitioned into $ K $ groups as $ \lbrace \pi_k \rbrace^K_{k=1} $, where $ \pi_k $ denotes the group of the $k$^th class with $ n_k $ data points. Classical LDA deals with single-label problems, where data partitions are mutually exclusive, i.e., $ \pi_i \cap \pi_j = \varnothing $ if $ i \neq j $, and $ \sum^{K}_{k=1} n_k = n $.

We write $ X = [ x^{(1)},\cdots,x^{(n)} ]^T $ and

\[Y = [y^{(1)},\cdots,y^{(n)}]^T = [y_{(1)},\cdots,y_{(K)}]\]where $ y_{(k)} \in {0, 1}^n $ is the class-wise label indication vector for the $k^{th}$ class.

简单理一下：

- # of features = $ p $
- # of samples = $ n $
- $ x^{(i)} $ is a $ p \times 1 $ vector
- $ X $ is a $ n \times p $ matrix
- $ y^{(i)} $ is a $ K \times 1 $ vector
- $ y_{(i)} $ is a $ n \times 1 $ vector
- $ Y $ is a $ n \times K $ matrix

Classical LDA seeks a linear transformation $ G \in \mathbb{R}^{p \times r} $ that maps $ x^{(i)} $ in the high $p$-dimensional space to $ q^{(i)} \in \mathbb{R}^{r} $ in a lower $r$-dimensional ($r < p $) space by $ q^{(i)} = G^T x^{(i)} $. In classical LDA, the * between-class*,

*, and*

**within-class**

**total-class***scatter matrices*are defined as follows:

where $ m_k = \frac{1}{n_k} \sum_{x^{(i)} \in \pi_k}{x^{(i)}} $ is the class mean (class centroid) of the $k$^th class, $ m = \frac{1}{n} \sum_{i=1}^{n}{x^{(i)}} $ is the global mean (global centroid), and $ S_t = S_b + S_w $.

The optimal $ G $ is chosen such that the * between-class* distance is maximize whilst the

*distance is minimized in the low-dimensional projected space, which leads to the standard LDA optimization objective as follows:*

**within-class**In linear algebra, the trace (迹) of an $ n \times n $ square matrix $ A $ is defined to be the sum of the elements on the main diagonal (the diagonal from the upper left to the lower right) of $ A $, i.e.,

\[\operatorname{tr}(A) = a_{11} + a_{22} + \dots + a_{nn} = \sum_{i=1}^{n} a_{ii}\]
## Comments