Kernel SVM Primal -> Dual Derivation

Metronome · Jul 26, 2024

I am trying to derive the dual form of a support vector machine from the primal form (see the top of the section "Kernel SVM"). I proceeded assuming the

ξ

variables should be included in what the primal minimizes over, as I believe

C

is a hyperparameter but the

ξ

variables are parameters to be tweaked by the classifier (further evidenced by formulations from other sources which also minimize over

ξ

). I decided to first try the special case with

2

features and

2

training data points to see everything unpacked, and then maybe do it again using more general linear algebraic notation. So the primal I started with is...

\min\limits_{w_1, w_2, b, ξ_1, ξ_2}\ w_1 ^2 + w_2 ^2 + Cξ_1 + Cξ_2 \\ s.t.\ \ \ y_1 (w_1 {x_{feature 1}}_{_{point 1}} + w_2 {x_{feature 2}}_{_{point 1}} + b) \geq 1 - ξ_1 \\ y_2 (w_1 {x_{feature 1}}_{_{point 2}} + w_2 {x_{feature 2}}_{_{point 2}} + b) \geq 1 - ξ_2 \\ ξ_1 \geq 0 \\ ξ_2 \geq 0

...where everything is a scalar (no vectors). The Lagrangian should be...

\mathcal L = w_1 ^2 + w_2 ^2 + Cξ_1 + Cξ_2 - \alpha_1 (y_1 (w_1 {x_{feature 1}}_{_{point 1}} + w_2 {x_{feature 2}}_{_{point 1}} + b) - 1 + ξ_1)\ -\ \alpha_2 (y_2 (w_1 {x_{feature 1}}_{_{point 2}} + w_2 {x_{feature 2}}_{_{point 2}} + b) - 1 + ξ_2)

Copying this step, the original problem gets replaced by

\max\limits_{\alpha_1 \geq 0, \alpha_2 \geq 0}\ [\min\limits_{w_1, w_2, b, ξ_1, ξ_2} \mathcal L]

. To solve the inner minimization, I set the gradient of the Lagrangian to the zero vector and get the system...

w_1 = \frac{\alpha_1 y_1 {x_{feature 1}}_{_{point 1}} + \alpha_2 y_2 {x_{feature 1}}_{_{point 2}}}{2} \\ w_2 = \frac{\alpha_1 y_1 {x_{feature 2}}_{_{point 1}} + \alpha_2 y_2 {x_{feature 2}}_{_{point 2}}}{2} \\ \alpha_1 y_1 = -\alpha_2 y_2 \\ \alpha_1 = C \\ \alpha_2 = C

...but there is not enough information in this system to substitute

b

,

ξ_1

, or

ξ_2

out of the Lagrangian. How can I proceed? Was it incorrect to put

ξ_1

and

ξ_2

in the minimization in the primal? I don't see how treating them as constants would fix this.

Secondly, did I write the Lagrangian correctly? The main thing I'm unsure about is whether I need to subtract another

ξ_1

and

ξ_2

from the Lagrangian to account for the nonnegativity constraints of the primal, but I went with my hunch that these constraints are simply not included in the Lagrangian representation.

Kernel SVM Primal -> Dual Derivation

Metronome

Junior Member