Kernel SVM Primal -> Dual Derivation

Metronome

Junior Member
Joined
Jun 12, 2018
Messages
153
I am trying to derive the dual form of a support vector machine from the primal form (see the top of the section "Kernel SVM"). I proceeded assuming the ξξ variables should be included in what the primal minimizes over, as I believe CC is a hyperparameter but the ξξ variables are parameters to be tweaked by the classifier (further evidenced by formulations from other sources which also minimize over ξξ). I decided to first try the special case with 22 features and 22 training data points to see everything unpacked, and then maybe do it again using more general linear algebraic notation. So the primal I started with is...minw1,w2,b,ξ1,ξ2 w12+w22+Cξ1+Cξ2s.t.   y1(w1xfeature1point1+w2xfeature2point1+b)1ξ1y2(w1xfeature1point2+w2xfeature2point2+b)1ξ2ξ10ξ20\min\limits_{w_1, w_2, b, ξ_1, ξ_2}\ w_1 ^2 + w_2 ^2 + Cξ_1 + Cξ_2 \\ s.t.\ \ \ y_1 (w_1 {x_{feature 1}}_{_{point 1}} + w_2 {x_{feature 2}}_{_{point 1}} + b) \geq 1 - ξ_1 \\ y_2 (w_1 {x_{feature 1}}_{_{point 2}} + w_2 {x_{feature 2}}_{_{point 2}} + b) \geq 1 - ξ_2 \\ ξ_1 \geq 0 \\ ξ_2 \geq 0...where everything is a scalar (no vectors). The Lagrangian should be...L=w12+w22+Cξ1+Cξ2α1(y1(w1xfeature1point1+w2xfeature2point1+b)1+ξ1)  α2(y2(w1xfeature1point2+w2xfeature2point2+b)1+ξ2)\mathcal L = w_1 ^2 + w_2 ^2 + Cξ_1 + Cξ_2 - \alpha_1 (y_1 (w_1 {x_{feature 1}}_{_{point 1}} + w_2 {x_{feature 2}}_{_{point 1}} + b) - 1 + ξ_1)\ -\ \alpha_2 (y_2 (w_1 {x_{feature 1}}_{_{point 2}} + w_2 {x_{feature 2}}_{_{point 2}} + b) - 1 + ξ_2)Copying this step, the original problem gets replaced by maxα10,α20 [minw1,w2,b,ξ1,ξ2L]\max\limits_{\alpha_1 \geq 0, \alpha_2 \geq 0}\ [\min\limits_{w_1, w_2, b, ξ_1, ξ_2} \mathcal L]. To solve the inner minimization, I set the gradient of the Lagrangian to the zero vector and get the system...w1=α1y1xfeature1point1+α2y2xfeature1point22w2=α1y1xfeature2point1+α2y2xfeature2point22α1y1=α2y2α1=Cα2=Cw_1 = \frac{\alpha_1 y_1 {x_{feature 1}}_{_{point 1}} + \alpha_2 y_2 {x_{feature 1}}_{_{point 2}}}{2} \\ w_2 = \frac{\alpha_1 y_1 {x_{feature 2}}_{_{point 1}} + \alpha_2 y_2 {x_{feature 2}}_{_{point 2}}}{2} \\ \alpha_1 y_1 = -\alpha_2 y_2 \\ \alpha_1 = C \\ \alpha_2 = C...but there is not enough information in this system to substitute bb, ξ1ξ_1, or ξ2ξ_2 out of the Lagrangian. How can I proceed? Was it incorrect to put ξ1ξ_1 and ξ2ξ_2 in the minimization in the primal? I don't see how treating them as constants would fix this.

Secondly, did I write the Lagrangian correctly? The main thing I'm unsure about is whether I need to subtract another ξ1ξ_1 and ξ2ξ_2 from the Lagrangian to account for the nonnegativity constraints of the primal, but I went with my hunch that these constraints are simply not included in the Lagrangian representation.
 
Top