5 minute read

The inconsistency among the unlabeled examples:

Fu(yu,S)=i,j=1nuSi,juuexp(yiuyju)=by symmetry12i,j=1nuSj,iuuexp(yjuyiu)+12i,j=1nuSi,juuexp(yiuyju)

The inconsistency between labeled and unlabeled examples:

Fl(y,S)=i=1nlj=1nuSi,jluexp(2yilyju)

Objective function:

(1)F(y,S)=Fl(y,S)+CFu(yu,S)

The optimal pseudo labels yu can be found by minimizing F, formally

(2)minyuF(y,S)

This is a convex optimaiztion problem and can be solved effectively by numerical methods, which have nothing to do with your learning algorithm A.

Now we do want to involve our learning algorithm A and use the same idea of problem (2) to improve A. On the other hand, we want to put problem (2) into a machine learning scenario and still find optimal yu.

Suppose you are going to solve problem (2) with gradient. Therefore during every step of your iteration, you update yu to get a smaller F. In the machine learning scenario, what you update is a classifier H which would predict and pseudo-label yu to get a smaller F.

That is to say, we are going to substitute y=[yl;yu] to y=[yl;H(xu)] or even y=H(x) s.t. H(xl)=yl.

We further expand our machine learning scenario to involve an ensemble of classifiers:

H(T)(x)=t=1Tαth(t)(x)

At the (T+1)th iteration, our goal is to find:

h(T+1)(x),αT+1=argminh(T+1)(x),αT+1F(H(T+1)(x),S)(3)s.t. h(T+1)(xi)=yil,i=1,,nl

To simplify the notation, define H(T)(xj)Hj, h(T+1)(xj)hj and αT+1α.

Thus

H(T+1)(xi)=Hi+αhi

Expand F(y,S) by substituding yu with H(xu):

Fl(y,S)=i=1nlj=1nuSi,jluexp(2yil(Hj+αhj)Fu(yu,S)=i,j=1nuSi,juuexp(Hi+αhiHjαhj)=i,j=1nuSi,juuexp(HiHj)exp[α(hihj)]

Now the problem becomes:

minh(x),αF(y,S)=Fl(y,S)+CFu(yu,S)(4)=i=1nlj=1nuSi,jluexp(2yil(Hj+αhj)+Ci,j=1nuSi,juuexp(HiHj)exp[α(hihj)]s.t. hi=yil,i=1,,nl

Problem (4) involves products of α and hi, making it nonlinear and, hence, difficult to optimize. We are going to apply Bound-Optimization below to solve this problem.

We first further expand Fl(y,S):

Fl(y,S)=i=1nlj=1nuSi,jluexp(2yil(Hj+αhj)=i=1nlj=1nuSi,jlu{I(yil,1)exp[2(Hj+αhj)]+I(yil,1)exp[2(Hj+αhj)]}=j=1nuexp(2αhj)i=1nlSi,jluI(yil,1)exp(2Hj)+j=1nuexp(2αhj)i=1nlSi,jluI(yil,1)exp(2Hj)

Then we find an upper bound of Fu(yu,S):

exp[α(hihj)]12[exp(2αhi)+exp(2αhj)]Fu(yu,S)Fu(yu,S)=i,j=1nu12Si,juuexp(HiHj)exp(2αhi)+i,j=1nu12Si,juuexp(HiHj)exp(2αhj)

Flip i and j of the first term in Fu(yu,S):

Fu(yu,S)=i,j=1nu12Si,juuexp(HjHi)exp(2αhj)+i,j=1nu12Si,juuexp(HiHj)exp(2αhj)=j=1nuexp(2αhj)i=1nu12Si,juuexp(HiHj)+j=1nuexp(2αhj)i=1nu12Si,juuexp(HjHi) F(y,S)=Fl(y,S)+CFu(yu,S)F1(y,S)=Fl(y,S)+CFu(yu,S)=j=1nuexp(2αhj)[i=1nlSi,jluI(yil,1)exp(2Hj)+i=1nuC2Si,juuexp(HiHj)]+j=1nuexp(2αhj)[i=1nlSi,jluI(yil,1)exp(2Hj)+i=1nuC2Si,juuexp(HjHi)]

Define:

pj=i=1nlSi,jluI(yil,1)exp(2Hj)+i=1nuC2Si,juuexp(HiHj)qj=i=1nlSi,jluI(yil,1)exp(2Hj)+i=1nuC2Si,juuexp(HjHi)

Note that when calculating pj and qj, j is fixed and pj and qj are functions of Hj.

pj and qj can be interpreted as the confidence in classifying the unlabeled example xi into the positive class and the negative class, respectively.

Claim 1: Problem (4) is equivalent to

(5)minh(x),αF1(y,S)=j=1nuexp(2αhj)pj+j=1nuexp(2αhj)qj

The expression in (5) is difficult to optimize since the weight α and the classifier h are coupled together. We simplify the problem furhter using the upper bound of F1.

1+xexp(x)exp(γx)exp(γ)+exp(γ)1+γx,x{1,1}exp(2αhj)exp(2α)+exp(2α)12αhjexp(2αhj)exp(2α)+exp(2α)1+2αhjF1(y,S)F2(y,S)=j=1nu(e2α+e2α1)(pj+qj)j=1nu2αhj(pjqj)

Claim 2: Problem (5) is equivalent to

(6)minh(x),αF2(y,S)=j=1nu(e2α+e2α1)(pj+qj)j=1nu2αhj(pjqj)

Comments