**(1) Maximum the margin**

SVM is very easy to understand on the graph,, we just need to find the a separate plane which maximum the margin. see the graph below:

**(2) How to calculate/find the max Margin**

Assuming hard-margin issue for the simplicity of math, the separate plane can be expressed as:

w*x -b = 0 where we can always adjust w, b so that:

w * x -b = +1, w*x – b = -1 on those supported vector ( minimized data points)

The distance will be 2/ || w||, thus we want to minimized :

min (over w,b) : 1/2 * || w || ^2 ( those 1/2 and ^2 is for easy math )

s.t: ( w * x(j) + b ) * y(j) >= 1 for any j.

**(3) How to solve minimum issue with some ****constraints**

The math for solving this kind of issue with constraints is using Lagrange multiplier method, the Lagrange function is:

L ( w, a) = 1/2 * || w || ^2 – SUM a(j) *( ( w * x(j) + b ) * y(j) -1 ) ( absorb b into w for writing simplicity )

where a(j) >=0 for any j.

The so called primal is: min(w)[ max (a) L(w, a) ]

( this can be proofed it is equivalent to the original problem , the intuition is: constraints is hard to solve, but with the Lagrange function, we can get rid of those constraints, also minus the biggest one respect to a, then find the minimized over w sounds reasonable )

normally we want to deal with so call dual problem which is: max(a) [ min(w,b) L( w, a) ] ( notice we exchange the order of max/min)

which will be the lower bound of the primal, under some condition, it is the dual gap will be 0.

and the dual is easier to solve it.

**(4) Now let’s look at the dual problem**:

To: max(a) [ min(w) L( w, a) ],

Let ‘s forget the max(a) first, just assuming a fixed a first, the internal function become min(w) , so to min:

L(w,b) = 1/2 * || w || ^2 – SUM a(j) *( ( w * x(j) + b ) * y(j) -1 )

we need to:

dL/dw = w – sum a(j) y(j) x(j) = 0

dL/db = sum a(j) y(j) = 0

Then we use the above conditions to feed into max(a) L(a), we got:

max L(a) = 1/2 * || w || ^2 – SUM a(j) *( ( w * x(j) + b ) * y(j) -1 ) ) ]

= max [ Sum a(i) – 1/2 Sum a(i)a(j) y(i)y(j) x(i) x(j) ] over a(i)

s.t.: Sum a(i) y(i) = 0, a(i) >=0

where w = sum a(j) y(j) x(j), b = y(k) – w* x(k)

Now the above problem become a standard quadratic programming, thus it is solvable now.

**References:**

https://en.wikipedia.org/wiki/Support_vector_machine

http://www.cs.cmu.edu/~guestrin/Class/10701-S07/Slides/kernels.pdf

http://cs229.stanford.edu/notes/cs229-notes3.pdf