I am trying to derive the dual form of a support vector machine from the primal form (see the top of the section "Kernel SVM"). I proceeded assuming the ξ variables should be included in what the primal minimizes over, as I believe C is a hyperparameter but the ξ variables are parameters to be tweaked by the classifier (further evidenced by formulations from other sources which also minimize over ξ). I decided to first try the special case with 2 features and 2 training data points to see everything unpacked, and then maybe do it again using more general linear algebraic notation. So the primal I started with is...w1,w2,b,ξ1,ξ2min w12+w22+Cξ1+Cξ2s.t. y1(w1xfeature1point1+w2xfeature2point1+b)≥1−ξ1y2(w1xfeature1point2+w2xfeature2point2+b)≥1−ξ2ξ1≥0ξ2≥0...where everything is a scalar (no vectors). The Lagrangian should be...L=w12+w22+Cξ1+Cξ2−α1(y1(w1xfeature1point1+w2xfeature2point1+b)−1+ξ1) − α2(y2(w1xfeature1point2+w2xfeature2point2+b)−1+ξ2)Copying this step, the original problem gets replaced by α1≥0,α2≥0max [w1,w2,b,ξ1,ξ2minL]. To solve the inner minimization, I set the gradient of the Lagrangian to the zero vector and get the system...w1=2α1y1xfeature1point1+α2y2xfeature1point2w2=2α1y1xfeature2point1+α2y2xfeature2point2α1y1=−α2y2α1=Cα2=C...but there is not enough information in this system to substitute b, ξ1, or ξ2 out of the Lagrangian. How can I proceed? Was it incorrect to put ξ1 and ξ2 in the minimization in the primal? I don't see how treating them as constants would fix this.
Secondly, did I write the Lagrangian correctly? The main thing I'm unsure about is whether I need to subtract another ξ1 and ξ2 from the Lagrangian to account for the nonnegativity constraints of the primal, but I went with my hunch that these constraints are simply not included in the Lagrangian representation.
Secondly, did I write the Lagrangian correctly? The main thing I'm unsure about is whether I need to subtract another ξ1 and ξ2 from the Lagrangian to account for the nonnegativity constraints of the primal, but I went with my hunch that these constraints are simply not included in the Lagrangian representation.