2 minute read

参考自 Multi-label Linear Discriminant Analysis


Linear discriminant analysis (LDA) is a well-known method for dimensionality reduction.

Given a data set with n samples {x(i),y(i)}i=1n and K classes, where x(i)Rp and y(i){0,1}K (K 维的 0-1 vector). yk(i)=1 if x(i) belongs to the k^th class, and 0 otherwise.

Let input data be partitioned into K groups as {πk}k=1K, where πk denotes the group of the k^th class with nk data points. Classical LDA deals with single-label problems, where data partitions are mutually exclusive, i.e., πiπj= if ij, and k=1Knk=n.

We write X=[x(1),,x(n)]T and

Y=[y(1),,y(n)]T=[y(1),,y(K)]

where y(k)0,1n is the class-wise label indication vector for the kth class.

简单理一下:

  • # of features = p
  • # of samples = n
  • x(i) is a p×1 vector
  • X is a n×p matrix
  • y(i) is a K×1 vector
  • y(i) is a n×1 vector
  • Y is a n×K matrix

Classical LDA seeks a linear transformation GRp×r that maps x(i) in the high p-dimensional space to q(i)Rr in a lower r-dimensional (r<p) space by q(i)=GTx(i). In classical LDA, the between-class, within-class, and total-class scatter matrices are defined as follows:

Sb=k=1Knk(mkm)(mkm)TSw=k=1Kx(i)πk(x(i)mk)(x(i)mk)TSt=i=1n(x(i)m)(x(i)m)T

where mk=1nkx(i)πkx(i) is the class mean (class centroid) of the k^th class, m=1ni=1nx(i) is the global mean (global centroid), and St=Sb+Sw.

The optimal G is chosen such that the between-class distance is maximize whilst the within-class distance is minimized in the low-dimensional projected space, which leads to the standard LDA optimization objective as follows:

J=tr(GTSbGGTSwG)G=argmaxGJ

In linear algebra, the trace (迹) of an n×n square matrix A is defined to be the sum of the elements on the main diagonal (the diagonal from the upper left to the lower right) of A, i.e.,

tr(A)=a11+a22++ann=i=1naii

Comments