3
Sep

# How SVM (Support Vector Machine) algorithm works

Hello, I will explain how SVM algorithm works.
This video will explain the support vector machine for linearly separable binary sets
Suppose we have this two features, x1 and x2 here and we want to classify all this elements
You can see that we have the class square and the class rectangle
So the goal of the SVM is to design a hyperplane, here we define this green line as the hyperplane,
that classifies all the training vectors in two classes
Here we show two different hyperplanes which can classify correctly all the instances in
this feature set But the best choice will be the hyperplane
that leaves the maximum margin from both classes The margin is this distance between the hyperplane
and the closest elements from this hyperplane We have the case of the red hyperplane we
have this distance, so this is the margin, which we represent by z1
And in the case of the green hyperplane we have the margin that we call z2
We can clearly see that the value of z2 is greater than z1
So the margin is higher in the case of the green hyperplane, so in this case the best
choice will be the green hyperplane Suppose we have this hyperplane, this hyperplane
is defined by one equation, we can state this equation as this one
We have a vector of weights plus omega 0 and this equation will deliver values greater
than 1 for all the input vectors which belongs to the class 1, in this case the circles
And also, we scale this hyperplane so that it will deliver values smaller than -1 for
all values which belongs to class number 2, the rectangles
We can say that this distance to the closest elements will be at least 1, the modulus is
1 From the geometry we know that the distance
between a point and a hyperplane is computed by this equation
So the total margin which is composed by this distance will be computed by this equation
And the aim is that minimizing this term will maximize the separability
When we minimize this weight vector we will have the biggest margin here that will split
this two classes To minimize this weight vector is a nonlinear
optimization task, which can be solved by this conditions (KKT), which uses Langrange
multipliers The main equations state that the value of
omega will be the solution of this sum here And we also have this other rule. So when
we solve these equations, trying to minimize this omega vector, we will maximize the margin
between the two classes which will maximize the separability the two classes
Here we show a simple example Suppose we have these 2 features, x1 and x2,
and we have these 3 values We want to design, or to find the best hyperplane
that will divide this 2 classes So we know that we can see clearly from this
graph that the best division line will be a parallel line to the line that connects
these 2 values here So we can define this weight vector, which
is this point minus this other point. So we have the constant a and 2 times this constant
a Now we can solve this weight vector and create
the hyperplane equations considering this weight vector
We must discover the values of this a here Since we have this weight vector omega here,
we can substitute the values of this point and also using this point we can substitute
these 2 values here When we place the equation g using the input
vector (1,1) we know that we have the value -1 because this belongs to the class circle
So we will have this value here, when we use the second point, we apply the function and
we know that it will deliver the value 1 So we substitute here in the equation also
Well, given 2 equations we can isolate the value of omega 0 in the second equation and
we will have omega 0 equal to 1 minus 8 times a
So, using this value, we put the omega 0 in the first equation and we will reach the value
of a, which is 2 divided by 5 Now we discover the value of a and now we
substitute the first equation and also discover the value of omega 0
So by dividing here we will come to the conclusion that omega 0 is minus 11 divided by 5 and
since we know that the weight vector is a and 2 a we can substitute the value of a here
and we will deliver these values of the weight vector
So in this case, these are called the support vectors because they compose the omega value
2 divided by 5 and 4 divided by 5 And we substitute here the values of omega
(2 divided by 5 and 4 divided by 5) and also the omega 0 value we will deliver the final
equation which defines this green hyperplane which is x1 plus 2 times x2 minus 5.5
And this hyperplane classifies the elements using support vector machines
These are some references that we have used So this is how SVM algorithm works

• Wassauf Khalid says:

ultra slow speed

• Just Me says:

• Tob Ias says:

Thanks, I wonder what you do when the two classes overlap.

• Saeed Nusri says:

Great video! Thanks for the explanation.

• Douglas Monteiro says:

are you Brazilian? you accent sounds a lot like the Brazilian one…

• Jugs Ma马家杰 says:

wow, the explanation in this video made me really clear about SVM.

• Satyo Wicaksana says:

what do we do when there is more than 1 support vector for each class?

• 5iLikePie5 says:

I think there is a mistake (not sure though): when you are getting the weight vector (a,2a), isnt it supposed to be a perpendicular vector to the one you get? The one u get(a,2a) is in direction of the line that goes through the 2 points and the line that separates (I guess the line that the weight vector is supposed to be on) is perpendicular to this one.
i.e. should the weight vector be -[a, 0.5*a] ? (derived from simple math like: y = mx , normal to this would be y = -mx)

• Kuka Tech says:

Thales, I liked the tutorial, but I think you would explain that better in portuguese. Thank you anyway!

• Michał Dobrzański says:

deez distance ftw!

• Siva prasanth says:

Sir you are the best.. Even a student with no knowledge of machine learning can understand this complex machine learning algorithm if he sees this video.

• 김대한 says:

Awesome!

Excellent Explanation! Thanks a lot!

Thank you very much sir

• lemus Aly says:

q oso

• foo bar 167 says:

w = ( (2,3) + (1,1) ) / 2 = (1.5, 2) — is the middle point between two points (2,3) and (1,1)
You could check this fact if you're take a ruler and measure coordinates of w manually.
Also I don't understand where constant "a" is came from (4:38 – "…so we have a constant 'a'…" – this is not an explanation).
And what is the reason to multiply (2,3) on (a,2a) => 2a+6a+w=1? I don't understand the logic behind this multiplication.

• Douglas Silva says:

You are brazilian, aren't you?

• Ulil Latifah says:

wow

• jor4288 says:

• Felipe Coutinho says:

You are good at this. Keep making videos, your videos are the best.

• nofreewill says:

As he speaks he almost gets drown.
And as I listen to him and hearing him drowning, I feel that neither can I breathe .

• nofreewill says:

3:53 But I don't want to separate the chocolates, I want 'em all.

• sundar raman s says:

Excellent Demo on SVM basics

• ForKSapien says:

Best tutorial I've seen. Helps me a lot. Thank you very much!!

• Boris Dessimond says:

Very clear and quick. Thanks!

• Tsunami! :o says:

Could you name your variables please? The vector w at 3:05 is being multiplied (dot prosuct) with the x vector. Why is w a vector and not a scalar?
I'm a confused boï ya'know!

• 祝晓 says:

看不懂

• Biranchi Narayan Nayak says:

This is the best video tutorial on SVM.

• RAMASUBRAMANIAN RAVI says:

Useful vedio thank u

• Quintin Lohuizen says:

You really gotta speak louder… My ears aren't that good and the volume of my laptop is already on its max….

• Zeyd Boukhers says:

It is a perfect explanation.

• Loop loop says:

Do I need to learn advance linear algebra 1st ?

• jansi rani says:

you elucidate very well i understood SVM concept very clear.. Thank you so much

• D Alexander says:

Did not understand much.Could have made bit more simpler. Where are the equations coming from?

• BEING SPIRITUAL says:

Speed 4x needed

What are the differences between SVM method and Naive Bayes method.

• Aziz Saouli says:

Anyone have svm code in matlab ?????

• zamine81 says:

thanks for the explanations, i would to know what is the highest number of classes or labels can be resolved by or used in SVM ? because in every introduction i saw only binary classification (two classes).

• Kamal Sehairi says:

Thank you Sir,
Simple and straightforward
If you can't explain it simply, you don't understand it well enough
I think you understand it very well and you presented it in very simple way

Is it the optimal hyperplane?

• Ashish Jha says:

Best video on SVM!

• says:

I got you are brazillian (like me) in the first minute… Thanks for sharing your knowledge!

• ndiayej100 says:

Hands down THE best SVM explanation I have ever seen!!! Thank you, Sir!

• Mr Schwszlsky says:

But how you sure us that realy work in much data??

• johnfy.k hikc says:

Hey, there's one of these about LOGISTIC REGRESSION?

• Deep Net says:

Thank you. Great Video.

• Amimul Ummah Baiqi says:

Dear Mr. Thales Sehn Korting
This video very usefull to me to do my thesis, but I want to ask you about the value of z1 and z2, why you said in this video Z2 is higher then Z1?
Thankyou sir

• ultra says:

@6:27 "support vectors" are defined as (2/5, 4/5). I was always under the impression that the 'support vectors' are the ones used to make the hyperplane (i.e. (1,1) and (2,3). Please advise.

• Canal doFuba says:

maravilha

• Mark Misin says:

Thank you for the video! Very well done! I am a bit confused about how you went from g(x)=2/5*x1+4/5*x2-11/5 to g(x)=x1+2*x2-5.5 in the end. You can only get that form if you multiply both sides of the equation by 5/2, but then it would be 5/2*g(x)=x1+2*x2-5.5

• Joan Perez Guallar says:

I was starting to drift off, dude speak a bit faster

• Akin O. says:

Excellent work sir!

• Talita Anthonio says:

5a = 2, how did you get to that? Shouldn't it be -2?

• Akin O. says:

Had to come back to revise my original comment.

This is the best tutorial on SVMs that I have ever come across (I have been through quite a few).

Really.

• Oliver Longhi says:

Can you make a video to explain how to use svm to solve non linearly separable problems as well?

• Hani YOUSFI says:

Thank you so much.. So clear and simple.

• Umair Hussain says:

Thanks alot Sir!

• birinhos says:

What is the "T" in g(x)= w^"T" …. ? Transpose ? Transformed ?

• Bala Kiswe says:

watch video with 1.5X speed….

• naouel ouafek says:

the best of the best thank you !

• mohit duklan says:

1.25 speed is nice . normal is sooo funnyyyy 😀

• Sebastian F. says:

1:38 why higher margin z1 than z2 means it is better hyperplane?

• Joe Siu says:

speak quieter please you're too loud

• Liu daniel says:

the total margin formula is unclear，more details should be provided.

• Elisa Warner says:

How did he go from g(x) = (2/5)(x_1)+(4/5)(x_2)-(11/5) from g(x)=x_1 + 2x_2 – 5.5? He multiplied by (5/2), but why? Doesn't that affect the result?

• Weisi Zhan says:

The example is so illustrating and finally makes it clear what 'support vector' means. Thanks for sharing 🙂

• Steven Pauly says:

why a, 2a–> why the 2a ??

• Hari Sankara says:

Simple and Best way to explain. Thx alot.

• Snehal Jaipurkar says:

Some basics cleared

Hello, I would like to use your examples in my paper. Is it okay to do so, and cite you? If so, do you have a paper or webpage I can reference? Thank you.

• Louis Boursier says:

Thank you, it was very clear despite it can easily be made complex. Well done!

Dear Sir, it was an excellent explanation. Thank you very very much…………….

• hu jiawei says:

sloppy explanation

• Omar Abd says:

thank you very much, nice explanation

• arjun hegde says:

In this example, we know the points (1,1) and (2,3).. .How will we know what are the closes points in real world problem?

VOCÊ ME SALVOU, OBRIGAADA!! thankss a lott

thanks

• Aridane Alamo says:

Thanks! very clear and very understandable english for non-english viewers too.

• umesh sai says:

Hi Thales. Your videos are awesome! Could you also cover the Kernel Trick and Gaussian Kernels in another video. The way you explain with the visualizations is good and I think this would benefit many. THANK YOU!

• Ali says:

But how does the algorithm works when you have more than 2 features? So for example your dataset consists of x1, x2, x3 and x4 and you want to predict Y, like the iris dataset for example.

• dirtysock says:

Really prime explanation, many thanks mane!

• peyman morassai says:

you are the best Thales, thank you so much…

• Zbynek Bazanowski says:

You are presenting two versions of g(x). Neither of them yields the real distance of the points from the plane. The real distance is 1.12 and -1.12 for (2, 3) and (1, 1), respectively. The g(x) giving the real distance is x_1/sqrt(5) + x_2*2/sqrt(5) – 11/2/sqrt(5).

• Aayush Poudel says:

Excellent tutorial, Mr Thales Sehn Körting
! Thanks very much.

• supergo1108 says:

Very clear! good video thanks!

• Zenville Erasmus says:

Nicely done, sir!

Incredibly painful to listen to, but the best explanation I've found. Audio quality could use a lot of improvement

• Henrique Bueno says:

Your pronunciation is very good. Any non-native english can understand.

• Tavvs Alves says:

Excellent video.

• Jacopo Solari says:

should mention that SVM can be used for regression as well..

• Ronit ganguly says:

Best lullaby on SVMs

• Murilo Guimaraes says:

Great explanation! You must be Brazilian by your accent right??

• Simarjeet Gill says:

Great representation… Much appreciated :3

• Elsio Antunes says:

Que sotaque brazuca hein Tales?

• bhimpel says:

• Ioanna K says:

Fantastic video – thank you so much for explaining this so well.

• Ali Acar says:

Thank you for this excellent resource.
at 03:10, is it true that "minimizing this term will maximize the separability" ?
or it should be as "maximizing this term will maximize the separability"?

• Qu Shouxin says:

Thanks for the presentation. One problem is that in the given example, the weight vector takes the same direction of the line connecting the two points of (1,1) and (2,3). This is OK if only these two points are to be separated. In case if there are more points, the method shown in this example does not work in general. For example, if another point (2,0) (also shown in the example) belongs to the same set of (2,3), then this method fails to separate the two sets.

• Sichao Jia says:

how to make this kind of presentation slides? what software?

• Mangireesh Potnis says:

I love this explanation….. thank you so much <3