# How kNN algorithm works

Hello

My name is Thales Sehn Körting and I will present very breafly how the kNN algorithm

works kNN means k nearest neighbors

It’s a very simple algorithm, and given N training vectors, suppose we have all these

‘a’ and ‘o’ letters as training vectors in this bidimensional feature space, the kNN

algorithm identifies the k nearest neighbors of ‘c’

‘c’ is another feature vector that we want to estimate its class

In this case it identifies the nearest neighbors regardless of labels

So, suppose this example we have k equal to 3, and we have the classes ‘a’ and ‘o’

And the aim of the algorithm is to find the class for ‘c’

If k is 3 we have to find the 3 nearest neighbors of ‘c’

So, we can see that in this case the 3 nearest neighbors of ‘c’ are these 3 elements here

We have 1 nearest neighbor of class ‘a’, we have 2 elements of the class ‘o’ which are

near to ‘c’ We have 2 votes for ‘o’ and 1 vote for ‘a’

In this case, the class of the element ‘c’ is going to be ‘o’

This is very simple how the algorithm k nearest neighbors works

Now, this is a special case of the kNN algorithm, is that when k is equal to 1

So, we must try to find the nearest neighbor of the element that will define the class

And to represent this feature space, each training vector will define a region in this

feature space here And a property that we have is that each region

is defined by this equation We have a distance between each element x

and x_i, that have to be smaller than the same distance for each other element

In this case it will define a Voronoi partition of the space, and can be defined, for example,

this element ‘c’ and these elements ‘b’, ‘e’ and ‘a’ will define these regions, very specific

regions This is a property of the kNN algorithm when

k is equal to 1 We define regions 1, 2, 3 and 4, based on

the nearest neighbor rule Each element that is inside this area will

be classified as ‘a’, as well as each element inside this area will be classified as ‘c’

And the same for the region 2 and region 3, for classes ‘e’ and ‘b’ as well

Now I have just some remarks about the kNN We have to chose and odd value of k if you

have a 2-class problem This happens because when we have a 2-class

and if we set k equal to 2, for example, we can have a tie

What will be the class? The majority class inside the nearest neighbors?

So, we have always to set odd values for a 2-class problem

And also the value of k must not be a multiple of the number of classes, it is also to avoid

ties And we have to remember that the main drawback

of this algorithm is the complexity in searching the nearest neighbors for each sample

The complexity is a problem because we have lots of elements, in the case of a big dataset

we will have lots of elements And we will have to search the distance between

each element to the element that we want to classify

So, for a large dataset, this can be a problem Anyhow, this kNN algorithm produces good results

So, this is the reference I have used to prepare this presentation

Thanks for your attention, and this is very breafly how the kNN algorithm works

Thank you so much!

Awesome explaination!! So far this is the best explaination on KNN!! Thank you for sharing !

Good lecture…Expecting more from you

Awesome explanation. Quick, simple, to the point. Exactly what I was needing to do my homework 🙂 Thanks!

Very useful! Also, does anyone else think he sounds like the guy from the END OF THE WORLD swf? 😀

Thank you for this video! It's very easy to understand.

Thank you for this video!

Thank you for making this video. It is informative and to the point, as well as being shorter than MIT OpenCourseWare's 49 minute lecture. Subbed and definitely will be watching your other videos to learn more machine learning topics!

Best of the Best ,explain so clear without complex math formula !

Thanks a lot. Understood completely.

hey thales, i wonder what does the x and xi represents from region 1?

?what is the distance function used to find the nearest

Hi Thales,

Thank you for your tutorial. It was very good instruction to KNN. It may be not a good place to ask technical questions. But do you mind give a direction of the problem I am having now I am suing KNN to distinguish the numbers. 0 – 9. And the question would be 1. How much training set will be needed normally for each number? 2. I am using the OpenCV to train the number image in the same font. Then for each training item in training set, is it ok to have same font but different szie of image to train ? how each item should be slightly different to draw the circle of one number ?

Many thanks.

For the first time, I know what KNN is all about! Thanks to you!

Thanks a lot!

Hey. The video is quite informative.

However, I was wondering how each training vector defines a region in space when k=1.

Now is very simple 🙂

Congratulations for the presentation, by your accent you are Brazilian are't you? I will keep following your videos, really enjoying the material. Just as a feedback, you mentioned distances so it would be interesting to explain how to calculate the distance and the type of distance measurements can be used :). Cheers from Florianópolis.

nice sir, this is so helpfull for my research 🙂

Very good explanation. Thanks for describing it in a simple way.

Voronoi partition is clearly explain and visually straightforward. Thanks for sharing such a good presentation

please mr.Thales Sehn Körting l not understand how to k-nearest neighbor smoother work because knn different to kernel estimation by bandwidth .how to knn work? in nonparametric regression

What is this c is all about ???? Can you give some example please, is it something like the c is …X men movie .. we try to identify in which class or category this movie falls into such as romantic or Action movie ??? The romantic and action movies are already classified by the KNN model.

very helpful, thanks a lot

Very clear explanation. Thanks. In particular, mentioning the potential ties was very helpful.

Simple. Concise. Thanks for sharing.

How do this algorithm works with missing values??

This is a question that could be in my test :/

1:13 – k=# (# = how many neighbours to check, before classifying)

1:29 – visually depicted how it works

3:48 – Remarks (no examples – no clear to understand)

Great explaination… <thank You

I studied kNN in school (multiple years ago), and worked on a team project (in school) where we were trying to predict home prices based on several numerical predictors. The problem is I don't recall how we did it. I have saved the write-up which says we used: the original listing price, # of bathrooms, is the house in zipcode A B or C, and was it built after 1980. The write-up also says we had a "K of 3" and a mean prediction error of $16,460. Can you connect the dots for me here? How do you predict home prices using kNN? would every price point be it own classification. Regression made more sense to me for this task.

In your example, when you find out that 'C' element belongs to 'O' class, if you would like to clasify another element 'D' , is 'C' taken in account to predict the label for 'D' or just the original samples are used to clasify it?

Thanks.

why k is < 6?

Great explanation – clear and simple – thanks much!

Thanks for the video

explained very well

good explanation with short time…😃

what do you do if your KNN has K+ points of equal distances?

Thank you very much

Clear and simple, thanks.

very good short explanations for beginners in field of machine learning………good job

clear and simple explanation Helped me a lot in supervised classification. thank you!

Thanks, clear and perfect!.

Thank you! Great video

Your videos were clear and understandable by everyone thanks for putting in your efforts on this 🙂

thank you

Really good, Keep Doing….

can i use k=2?

So what about when k=1 and we have 2 classes A and B. We want to to identify a new element x in one of this classes and the distances are equal. Then the algorithm in which class puts the element x?

Thank you very much for your video.It is very help full to me.But i have a some confusion,that is when k is multiples of classes only the problem is finding the element belonging class.Is that right?

THANK YOU SOOOOOOOOOOO MUCH YOU 100% deserve a sub from me

Really useful, thank you so much!

nice video, thank you very much

thanks fr ur explanation

thanks so much for this video, Can I know what is the software that you are using for the presentation?

😉 thank you. it helps me a lot!

drink some coffee man

Thank you so much

the graphics make it easy to understand

Aren't different metrics, like Manhattan distance, EMD, or euclid helping with the drawback of kNN algorithm? At least that's what I used in my bachelor's thesis in Facial Pattern Recognition via LBP + kNN classification (+k means)

thank u very much. it is really useful

Short and good explanation, even better to speed 2x 🙂

here a nice related post i found on the same topic http://www.mien.in/2017/09/25/implementing-knn-from-scratch-in-python/

To the point, excellent video

thank you so much!

Thank you.

Short, simple and straightforward

Amazing!!

Thanks a lot sir

Thank you!

thanks for the great work..could u make a video on selecting apt K Values for differemt data sets and how the complexity of calc nearest neighbours can be minimised?I'll be highly obliged.

You Brasilian? I am too! thanks for the video

how do we choose an optimum k count? is there specific method to choose k?

Thank you for making this video. Could you please make another video to explain how to use KNN in solving regression problem?

Possible image knn code in Java

Thank you very much!

Thank you very much for your explanation

Muito bem explicado

Thank you, sir!! 🙂

Great explanation! Thanks a lot for it 🙂

2X speed

is good

Thank You! You made it so easy. I now use your explanation to teach others.

I do not work like that buddy

100

Great work. Keep it up. I have been trying to learn and write about kNN. Would be great if you can check it out and comment. In the post, I built a model and iterated it until i found an optimal test accuracy for prediction: https://the-ml-blogs.blogspot.com/2018/12/4-deeper-into-knn.html

Great video! And I love your accent!

As simple as that? I was studying KNN from Introduction To Statistical Learning and I felt it so much tough to understand.

I am trying to learn Data Science, I have completed like 4 courses of Datacamp's path to Data Scientist With Python. I am feeling like the path is less concept oriented and more based on how to use python for data science.

I am totally confused with how to learn Data Science, I have 20 semester end holidays left. I want to use the time. I have tried the book Introduction to statistical learning, the book applies the methods using R and I am already learning python(using Datacamp).

I have tried few statistics courses which were complete theory. I have tried Andrew NG and everything went above my head.

I am in search for a course that includes both, the practical as well as the theoretical part.

Any Singaporeans here?

Nice video, thanks! But I do have a question: in k=1 case why do we have to do Voronoi partition? Instead can't we choose the class of the new sample by the shortest distance to representative points of the training dataset classes? I guess it is the same at the end of the day but isn't it easier to calculate a sample's shortest distances than calculating for the whole Euclidean Space?

Good explanation

soft voice！！！

so, which class will be determined for 'c'?

Short and simple, thanks a lot!

You should have explained how KNN is trained first. How the centroid approach works. Without training explained, you are testing the data point.

Such a nice and straightforward explanation on k-NN algorithm! thanks a lot

I have never posted a comment but wanted to thank you for the great explanation!

Thanks but I have a question: what is meant by class?

DO EVERY ALGORITHM WHICH EXISTS

Great video but I have one unanswered question:

Given we have three classes A B C and k = 5. Let's assume the nearest neighbours are A A B B C, how is determined whether A or B should be returned as a result?

In knn algorithm which feature extraction is used?

Bow or n-gram

How do you decide the number of k?

how does this work irL?

Nice tutorial. Thanks