# How SVM (Support Vector Machine) algorithm works

Hello, I will explain how SVM algorithm works.

This video will explain the support vector machine for linearly separable binary sets

Suppose we have this two features, x1 and x2 here and we want to classify all this elements

You can see that we have the class square and the class rectangle

So the goal of the SVM is to design a hyperplane, here we define this green line as the hyperplane,

that classifies all the training vectors in two classes

Here we show two different hyperplanes which can classify correctly all the instances in

this feature set But the best choice will be the hyperplane

that leaves the maximum margin from both classes The margin is this distance between the hyperplane

and the closest elements from this hyperplane We have the case of the red hyperplane we

have this distance, so this is the margin, which we represent by z1

And in the case of the green hyperplane we have the margin that we call z2

We can clearly see that the value of z2 is greater than z1

So the margin is higher in the case of the green hyperplane, so in this case the best

choice will be the green hyperplane Suppose we have this hyperplane, this hyperplane

is defined by one equation, we can state this equation as this one

We have a vector of weights plus omega 0 and this equation will deliver values greater

than 1 for all the input vectors which belongs to the class 1, in this case the circles

And also, we scale this hyperplane so that it will deliver values smaller than -1 for

all values which belongs to class number 2, the rectangles

We can say that this distance to the closest elements will be at least 1, the modulus is

1 From the geometry we know that the distance

between a point and a hyperplane is computed by this equation

So the total margin which is composed by this distance will be computed by this equation

And the aim is that minimizing this term will maximize the separability

When we minimize this weight vector we will have the biggest margin here that will split

this two classes To minimize this weight vector is a nonlinear

optimization task, which can be solved by this conditions (KKT), which uses Langrange

multipliers The main equations state that the value of

omega will be the solution of this sum here And we also have this other rule. So when

we solve these equations, trying to minimize this omega vector, we will maximize the margin

between the two classes which will maximize the separability the two classes

Here we show a simple example Suppose we have these 2 features, x1 and x2,

and we have these 3 values We want to design, or to find the best hyperplane

that will divide this 2 classes So we know that we can see clearly from this

graph that the best division line will be a parallel line to the line that connects

these 2 values here So we can define this weight vector, which

is this point minus this other point. So we have the constant a and 2 times this constant

a Now we can solve this weight vector and create

the hyperplane equations considering this weight vector

We must discover the values of this a here Since we have this weight vector omega here,

we can substitute the values of this point and also using this point we can substitute

these 2 values here When we place the equation g using the input

vector (1,1) we know that we have the value -1 because this belongs to the class circle

So we will have this value here, when we use the second point, we apply the function and

we know that it will deliver the value 1 So we substitute here in the equation also

Well, given 2 equations we can isolate the value of omega 0 in the second equation and

we will have omega 0 equal to 1 minus 8 times a

So, using this value, we put the omega 0 in the first equation and we will reach the value

of a, which is 2 divided by 5 Now we discover the value of a and now we

substitute the first equation and also discover the value of omega 0

So by dividing here we will come to the conclusion that omega 0 is minus 11 divided by 5 and

since we know that the weight vector is a and 2 a we can substitute the value of a here

and we will deliver these values of the weight vector

So in this case, these are called the support vectors because they compose the omega value

2 divided by 5 and 4 divided by 5 And we substitute here the values of omega

(2 divided by 5 and 4 divided by 5) and also the omega 0 value we will deliver the final

equation which defines this green hyperplane which is x1 plus 2 times x2 minus 5.5

And this hyperplane classifies the elements using support vector machines

These are some references that we have used So this is how SVM algorithm works

ultra slow speed

what horrible voice…I had a headache

Thanks, I wonder what you do when the two classes overlap.

Great video! Thanks for the explanation.

are you Brazilian? you accent sounds a lot like the Brazilian one…

wow, the explanation in this video made me really clear about SVM.

what do we do when there is more than 1 support vector for each class?

I think there is a mistake (not sure though): when you are getting the weight vector (a,2a), isnt it supposed to be a perpendicular vector to the one you get? The one u get(a,2a) is in direction of the line that goes through the 2 points and the line that separates (I guess the line that the weight vector is supposed to be on) is perpendicular to this one.

i.e. should the weight vector be -[a, 0.5*a] ? (derived from simple math like: y = mx , normal to this would be y = -mx)

Thales, I liked the tutorial, but I think you would explain that better in portuguese. Thank you anyway!

deez distance ftw!

Sir you are the best.. Even a student with no knowledge of machine learning can understand this complex machine learning algorithm if he sees this video.

Awesome!

Excellent Explanation! Thanks a lot!

Thank you very much sir

q oso

w = ( (2,3) + (1,1) ) / 2 = (1.5, 2) — is the middle point between two points (2,3) and (1,1)

You could check this fact if you're take a ruler and measure coordinates of w manually.

Also I don't understand where constant "a" is came from (4:38 – "…so we have a constant 'a'…" – this is not an explanation).

And what is the reason to multiply (2,3) on (a,2a) => 2a+6a+w=1? I don't understand the logic behind this multiplication.

You are brazilian, aren't you?

wow

This video is very helpful.

You are good at this. Keep making videos, your videos are the best.

As he speaks he almost gets drown.

And as I listen to him and hearing him drowning, I feel that neither can I breathe .

3:53 But I don't want to separate the chocolates, I want 'em all.

Excellent Demo on SVM basics

Best tutorial I've seen. Helps me a lot. Thank you very much!!

Very clear and quick. Thanks!

Could you name your variables please? The vector w at 3:05 is being multiplied (dot prosuct) with the x vector. Why is w a vector and not a scalar?

I'm a confused boï ya'know!

看不懂

This is the best video tutorial on SVM.

Useful vedio thank u

You really gotta speak louder… My ears aren't that good and the volume of my laptop is already on its max….

It is a perfect explanation.

Do I need to learn advance linear algebra 1st ?

you elucidate very well i understood SVM concept very clear.. Thank you so much

Did not understand much.Could have made bit more simpler. Where are the equations coming from?

Speed 4x needed

What are the differences between SVM method and Naive Bayes method.

Anyone have svm code in matlab ?????

thanks for the explanations, i would to know what is the highest number of classes or labels can be resolved by or used in SVM ? because in every introduction i saw only binary classification (two classes).

Thank you Sir,

Simple and straightforward

If you can't explain it simply, you don't understand it well enough

I think you understand it very well and you presented it in very simple way

Is it the optimal hyperplane?

Best video on SVM!

I got you are brazillian (like me) in the first minute… Thanks for sharing your knowledge!

Hands down THE best SVM explanation I have ever seen!!! Thank you, Sir!

But how you sure us that realy work in much data??

Hey, there's one of these about LOGISTIC REGRESSION?

Thank you. Great Video.

Dear Mr. Thales Sehn Korting

This video very usefull to me to do my thesis, but I want to ask you about the value of z1 and z2, why you said in this video Z2 is higher then Z1?

I will waiting your respon,

Thankyou sir

@6:27 "support vectors" are defined as (2/5, 4/5). I was always under the impression that the 'support vectors' are the ones used to make the hyperplane (i.e. (1,1) and (2,3). Please advise.

maravilha

Thank you for the video! Very well done! I am a bit confused about how you went from g(x)=2/5*x1+4/5*x2-11/5 to g(x)=x1+2*x2-5.5 in the end. You can only get that form if you multiply both sides of the equation by 5/2, but then it would be 5/2*g(x)=x1+2*x2-5.5

I was starting to drift off, dude speak a bit faster

Excellent work sir!

5a = 2, how did you get to that? Shouldn't it be -2?

Had to come back to revise my original comment.

This is the

best tutorial on SVMsthat I have ever come across (I have been through quite a few).Really.

Can you make a video to explain how to use svm to solve non linearly separable problems as well?

Thank you so much.. So clear and simple.

Thanks alot Sir!

love your accent btw :))

What is the "T" in g(x)= w^"T" …. ? Transpose ? Transformed ?

watch video with 1.5X speed….

the best of the best thank you !

1.25 speed is nice . normal is sooo funnyyyy 😀

1:38 why higher margin z1 than z2 means it is better hyperplane?

speak quieter please you're too loud

the total margin formula is unclear，more details should be provided.

How did he go from g(x) = (2/5)(x_1)+(4/5)(x_2)-(11/5) from g(x)=x_1 + 2x_2 – 5.5? He multiplied by (5/2), but why? Doesn't that affect the result?

The example is so illustrating and finally makes it clear what 'support vector' means. Thanks for sharing 🙂

why a, 2a–> why the 2a ??

Simple and Best way to explain. Thx alot.

Some basics cleared

Hello, I would like to use your examples in my paper. Is it okay to do so, and cite you? If so, do you have a paper or webpage I can reference? Thank you.

Thank you, it was very clear despite it can easily be made complex. Well done!

Dear Sir, it was an excellent explanation. Thank you very very much…………….

sloppy explanation

thank you very much, nice explanation

In this example, we know the points (1,1) and (2,3).. .How will we know what are the closes points in real world problem?

VOCÊ ME SALVOU, OBRIGAADA!! thankss a lott

thanks

Thanks! very clear and very understandable english for non-english viewers too.

Hi Thales. Your videos are awesome! Could you also cover the Kernel Trick and Gaussian Kernels in another video. The way you explain with the visualizations is good and I think this would benefit many. THANK YOU!

But how does the algorithm works when you have more than 2 features? So for example your dataset consists of x1, x2, x3 and x4 and you want to predict Y, like the iris dataset for example.

Really prime explanation, many thanks mane!

you are the best Thales, thank you so much…

You are presenting two versions of g(x). Neither of them yields the real distance of the points from the plane. The real distance is 1.12 and -1.12 for (2, 3) and (1, 1), respectively. The g(x) giving the real distance is x_1/sqrt(5) + x_2*2/sqrt(5) – 11/2/sqrt(5).

Excellent tutorial, Mr Thales Sehn Körting

! Thanks very much.

Very clear! good video thanks!

Nicely done, sir!

Incredibly painful to listen to, but the best explanation I've found. Audio quality could use a lot of improvement

Your pronunciation is very good. Any non-native english can understand.

Excellent video.

should mention that SVM can be used for regression as well..

Best lullaby on SVMs

Great explanation! You must be Brazilian by your accent right??

Great representation… Much appreciated :3

Que sotaque brazuca hein Tales?

Nice. Could you please update the link for the original presentation?

Fantastic video – thank you so much for explaining this so well.

Thank you for this excellent resource.

at 03:10, is it true that "minimizing this term will maximize the separability" ?

or it should be as "maximizing this term will maximize the separability"?

Thanks for the presentation. One problem is that in the given example, the weight vector takes the same direction of the line connecting the two points of (1,1) and (2,3). This is OK if only these two points are to be separated. In case if there are more points, the method shown in this example does not work in general. For example, if another point (2,0) (also shown in the example) belongs to the same set of (2,3), then this method fails to separate the two sets.

how to make this kind of presentation slides? what software?

I love this explanation….. thank you so much <3