SemiBoost

September 6, 2017 5 分钟阅读

The inconsistency among the unlabeled examples:

\begin{aligned} F_{u} (y^{u}, S) = & \sum_{i, j = 1}^{n_{u}} S_{i, j}^{u u} \exp (y_{i}^{u} - y_{j}^{u}) \\ \overset{by symmetry}{=} & \frac{1}{2} \sum_{i, j = 1}^{n_{u}} S_{j, i}^{u u} \exp (y_{j}^{u} - y_{i}^{u}) + \frac{1}{2} \sum_{i, j = 1}^{n_{u}} S_{i, j}^{u u} \exp (y_{i}^{u} - y_{j}^{u}) \end{aligned}

The inconsistency between labeled and unlabeled examples:

F_{l} (y, S) = \sum_{i = 1}^{n_{l}} \sum_{j = 1}^{n_{u}} S_{i, j}^{l u} \exp (- 2 y_{i}^{l} y_{j}^{u})

Objective function:

\begin{matrix} (1) & F (y, S) = F_{l} (y, S) + C F_{u} (y^{u}, S) \end{matrix}

The optimal pseudo labels $y^{u}$ can be found by minimizing $F$ , formally

\begin{matrix} (2) & min_{y^{u}} F (y, S) \end{matrix}

This is a convex optimaiztion problem and can be solved effectively by numerical methods, which have nothing to do with your learning algorithm $A$ .

Now we do want to involve our learning algorithm $A$ and use the same idea of problem $(2)$ to improve $A$ . On the other hand, we want to put problem $(2)$ into a machine learning scenario and still find optimal $y^{u}$ .

Suppose you are going to solve problem $(2)$ with gradient. Therefore during every step of your iteration, you update $y^{u}$ to get a smaller $F$ . In the machine learning scenario, what you update is a classifier $H$ which would predict and pseudo-label $y^{u}$ to get a smaller $F$ .

That is to say, we are going to substitute $y = [y^{l}; y^{u}]$ to $y = [y^{l}; H (x^{u})]$ or even $y = H (x) s.t. H (x^{l}) = y^{l}$ .

We further expand our machine learning scenario to involve an ensemble of classifiers:

H^{(T)} (x) = \sum_{t = 1}^{T} α_{t} h^{(t)} (x)

At the $(T + 1)^{th}$ iteration, our goal is to find:

\begin{aligned} h^{(T + 1)} (x), α_{T + 1} = & \underset{h^{(T + 1)} (x), α_{T + 1}}{\arg min} F (H^{(T + 1)} (x), S) \\ (3) & s.t. h^{(T + 1)} (x_{i}) = y_{i}^{l}, i = 1, \dots, n_{l} \end{aligned}

To simplify the notation, define $H^{(T)} (x_{j}) \equiv H_{j}$ , $h^{(T + 1)} (x_{j}) \equiv h_{j}$ and $α_{T + 1} \equiv α$ .

Thus

H^{(T + 1)} (x_{i}) = H_{i} + α h_{i}

Expand $F (y, S)$ by substituding $y^{u}$ with $H (x^{u})$ :

\begin{aligned} F_{l} (y, S) & = \sum_{i = 1}^{n_{l}} \sum_{j = 1}^{n_{u}} S_{i, j}^{l u} \exp (- 2 \cdot y_{i}^{l} \cdot (H_{j} + α h_{j}) \\ F_{u} (y^{u}, S) & = \sum_{i, j = 1}^{n_{u}} S_{i, j}^{u u} \exp (H_{i} + α h_{i} - H_{j} - α h_{j}) \\ = \sum_{i, j = 1}^{n_{u}} S_{i, j}^{u u} \exp (H_{i} - H_{j}) \cdot \exp [α (h_{i} - h_{j})] \end{aligned}

Now the problem becomes:

\begin{aligned} min_{h (x), α} & F (y, S) = F_{l} (y, S) + C F_{u} (y^{u}, S) \\ (4) & = \sum_{i = 1}^{n_{l}} \sum_{j = 1}^{n_{u}} S_{i, j}^{l u} \exp (- 2 \cdot y_{i}^{l} \cdot (H_{j} + α h_{j}) + C \sum_{i, j = 1}^{n_{u}} S_{i, j}^{u u} \exp (H_{i} - H_{j}) \cdot \exp [α (h_{i} - h_{j})] \\ s.t. & h_{i} = y_{i}^{l}, i = 1, \dots, n_{l} \end{aligned}

Problem $(4)$ involves products of $α$ and $h_{i}$ , making it nonlinear and, hence, difficult to optimize. We are going to apply Bound-Optimization below to solve this problem.

We first further expand $F_{l} (y, S)$ :

\begin{aligned} F_{l} (y, S) & = \sum_{i = 1}^{n_{l}} \sum_{j = 1}^{n_{u}} S_{i, j}^{l u} \exp (- 2 \cdot y_{i}^{l} \cdot (H_{j} + α h_{j}) \\ = \sum_{i = 1}^{n_{l}} \sum_{j = 1}^{n_{u}} S_{i, j}^{l u} {I (y_{i}^{l}, 1) \exp [- 2 (H_{j} + α h_{j})] + I (y_{i}^{l}, - 1) \exp [2 (H_{j} + α h_{j})]} \\ = \sum_{j = 1}^{n_{u}} \exp (- 2 α h_{j}) \cdot \sum_{i = 1}^{n_{l}} S_{i, j}^{l u} I (y_{i}^{l}, 1) \exp (- 2 H_{j}) + \sum_{j = 1}^{n_{u}} \exp (2 α h_{j}) \cdot \sum_{i = 1}^{n_{l}} S_{i, j}^{l u} I (y_{i}^{l}, - 1) \exp (2 H_{j}) \end{aligned}

Then we find an upper bound of $F_{u} (y^{u}, S)$ :

\begin{aligned} ∵ & \exp [α (h_{i} - h_{j})] \leq \frac{1}{2} [\exp (2 α h_{i}) + \exp (- 2 α h_{j})] \\ ∴ & F_{u} (y^{u}, S) \leq F_{u}^{'} (y^{u}, S) = \sum_{i, j = 1}^{n_{u}} \frac{1}{2} S_{i, j}^{u u} \exp (H_{i} - H_{j}) \cdot \exp (2 α h_{i}) + \sum_{i, j = 1}^{n_{u}} \frac{1}{2} S_{i, j}^{u u} \exp (H_{i} - H_{j}) \cdot \exp (- 2 α h_{j}) \end{aligned}

Flip $i$ and $j$ of the first term in $F_{u}^{'} (y^{u}, S)$ :

\begin{aligned} ∴ F_{u}^{'} (y^{u}, S) = & \sum_{i, j = 1}^{n_{u}} \frac{1}{2} S_{i, j}^{u u} \exp (H_{j} - H_{i}) \cdot \exp (2 α h_{j}) + \sum_{i, j = 1}^{n_{u}} \frac{1}{2} S_{i, j}^{u u} \exp (H_{i} - H_{j}) \cdot \exp (- 2 α h_{j}) \\ = & \sum_{j = 1}^{n_{u}} \exp (- 2 α h_{j}) \cdot \sum_{i = 1}^{n_{u}} \frac{1}{2} S_{i, j}^{u u} \exp (H_{i} - H_{j}) + \sum_{j = 1}^{n_{u}} \exp (2 α h_{j}) \cdot \sum_{i = 1}^{n_{u}} \frac{1}{2} S_{i, j}^{u u} \exp (H_{j} - H_{i}) \end{aligned}

\begin{aligned} ∴ F (y, S) = & F_{l} (y, S) + C F_{u} (y^{u}, S) \\ \leq {\overset{―}{F}}_{1} (y, S) = & F_{l} (y, S) + C F_{u}^{'} (y^{u}, S) \\ = & \sum_{j = 1}^{n_{u}} \exp (- 2 α h_{j}) \cdot [\sum_{i = 1}^{n_{l}} S_{i, j}^{l u} I (y_{i}^{l}, 1) \exp (- 2 H_{j}) + \sum_{i = 1}^{n_{u}} \frac{C}{2} S_{i, j}^{u u} \exp (H_{i} - H_{j})] + \\ \sum_{j = 1}^{n_{u}} \exp (2 α h_{j}) \cdot [\sum_{i = 1}^{n_{l}} S_{i, j}^{l u} I (y_{i}^{l}, - 1) \exp (2 H_{j}) + \sum_{i = 1}^{n_{u}} \frac{C}{2} S_{i, j}^{u u} \exp (H_{j} - H_{i})] \end{aligned}

Define:

\begin{aligned} p_{j} = & \sum_{i = 1}^{n_{l}} S_{i, j}^{l u} I (y_{i}^{l}, 1) \exp (- 2 H_{j}) + \sum_{i = 1}^{n_{u}} \frac{C}{2} S_{i, j}^{u u} \exp (H_{i} - H_{j}) \\ q_{j} = & \sum_{i = 1}^{n_{l}} S_{i, j}^{l u} I (y_{i}^{l}, - 1) \exp (2 H_{j}) + \sum_{i = 1}^{n_{u}} \frac{C}{2} S_{i, j}^{u u} \exp (H_{j} - H_{i}) \end{aligned}

Note that when calculating $p_{j}$ and $q_{j}$ , $j$ is fixed and $p_{j}$ and $q_{j}$ are functions of $H_{j}$ .

$p_{j}$ and $q_{j}$ can be interpreted as the confidence in classifying the unlabeled example $x_{i}$ into the positive class and the negative class, respectively.

$Claim 1 :$ Problem $(4)$ is equivalent to

\begin{matrix} (5) & min_{h (x), α} {\overset{―}{F}}_{1} (y, S) = \sum_{j = 1}^{n_{u}} \exp (- 2 α h_{j}) \cdot p_{j} + \sum_{j = 1}^{n_{u}} \exp (2 α h_{j}) \cdot q_{j} \end{matrix}

The expression in $(5)$ is difficult to optimize since the weight $α$ and the classifier $h$ are coupled together. We simplify the problem furhter using the upper bound of ${\overset{―}{F}}_{1}$ .

\begin{aligned} ∵ & 1 + x \leq \exp (x) \\ ∴ & \exp (γ x) \leq \exp (γ) + \exp (- γ) - 1 + γ x, \forall x \in {- 1, 1} \\ ∴ & \exp (- 2 α h_{j}) \leq \exp (- 2 α) + \exp (2 α) - 1 - 2 α h_{j} \\ \exp (2 α h_{j}) \leq \exp (2 α) + \exp (- 2 α) - 1 + 2 α h_{j} \\ ∴ & {\overset{―}{F}}_{1} (y, S) \leq {\overset{―}{F}}_{2} (y, S) = \sum_{j = 1}^{n_{u}} (e^{2 α} + e^{- 2 α} - 1) (p_{j} + q_{j}) - \sum_{j = 1}^{n_{u}} 2 α h_{j} (p_{j} - q_{j}) \end{aligned}

Simplest or nicest proof that $1 + x \leq e^{x}$

$Claim 2 :$ Problem $(5)$ is equivalent to

\begin{matrix} (6) & min_{h (x), α} {\overset{―}{F}}_{2} (y, S) = \sum_{j = 1}^{n_{u}} (e^{2 α} + e^{- 2 α} - 1) (p_{j} + q_{j}) - \sum_{j = 1}^{n_{u}} 2 α h_{j} (p_{j} - q_{j}) \end{matrix}

X Facebook LinkedIn Bluesky

SemiBoost

分享

留下评论

猜您还喜欢

LR Parsing #5: Intuition Revisited

LR Parsing #4: Runtime Encoding of LR(0)/SLR(1) Parsing DFA (How to Construct the Parsing Tables)

LR Parsing #3: Simulation of the Parsing DFA (Configuration / Shift-Reduce / Structure of Parsing Table)

LR Parsing #2: Structural Encoding of LR(0) Parsing DFA