9
Sep

# How kNN algorithm works

Hello
My name is Thales Sehn Körting and I will present very breafly how the kNN algorithm
works kNN means k nearest neighbors
It’s a very simple algorithm, and given N training vectors, suppose we have all these
‘a’ and ‘o’ letters as training vectors in this bidimensional feature space, the kNN
algorithm identifies the k nearest neighbors of ‘c’
‘c’ is another feature vector that we want to estimate its class
In this case it identifies the nearest neighbors regardless of labels
So, suppose this example we have k equal to 3, and we have the classes ‘a’ and ‘o’
And the aim of the algorithm is to find the class for ‘c’
If k is 3 we have to find the 3 nearest neighbors of ‘c’
So, we can see that in this case the 3 nearest neighbors of ‘c’ are these 3 elements here
We have 1 nearest neighbor of class ‘a’, we have 2 elements of the class ‘o’ which are
near to ‘c’ We have 2 votes for ‘o’ and 1 vote for ‘a’
In this case, the class of the element ‘c’ is going to be ‘o’
This is very simple how the algorithm k nearest neighbors works
Now, this is a special case of the kNN algorithm, is that when k is equal to 1
So, we must try to find the nearest neighbor of the element that will define the class
And to represent this feature space, each training vector will define a region in this
feature space here And a property that we have is that each region
is defined by this equation We have a distance between each element x
and x_i, that have to be smaller than the same distance for each other element
In this case it will define a Voronoi partition of the space, and can be defined, for example,
this element ‘c’ and these elements ‘b’, ‘e’ and ‘a’ will define these regions, very specific
regions This is a property of the kNN algorithm when
k is equal to 1 We define regions 1, 2, 3 and 4, based on
the nearest neighbor rule Each element that is inside this area will
be classified as ‘a’, as well as each element inside this area will be classified as ‘c’
And the same for the region 2 and region 3, for classes ‘e’ and ‘b’ as well
Now I have just some remarks about the kNN We have to chose and odd value of k if you
have a 2-class problem This happens because when we have a 2-class
and if we set k equal to 2, for example, we can have a tie
What will be the class? The majority class inside the nearest neighbors?
So, we have always to set odd values for a 2-class problem
And also the value of k must not be a multiple of the number of classes, it is also to avoid
ties And we have to remember that the main drawback
of this algorithm is the complexity in searching the nearest neighbors for each sample
The complexity is a problem because we have lots of elements, in the case of a big dataset
we will have lots of elements And we will have to search the distance between
each element to the element that we want to classify
So, for a large dataset, this can be a problem Anyhow, this kNN algorithm produces good results
So, this is the reference I have used to prepare this presentation
Thanks for your attention, and this is very breafly how the kNN algorithm works

• tara cai says:

Thank you so much!

• Suman Dhondalay says:

Awesome explaination!! So far this is the best explaination on KNN!! Thank you for sharing !

Good lecture…Expecting more from you

• Filipe Gonçalves says:

Awesome explanation. Quick, simple, to the point. Exactly what I was needing to do my homework 🙂 Thanks!

• Ray Mendoza says:

Very useful!  Also, does anyone else think he sounds like the guy from the END OF THE WORLD swf? 😀

• Vishal Katariya says:

Thank you for this video! It's very easy to understand.

• Md Nahiduzzaman Rose says:

Thank you for this video!

• Henry Li says:

Thank you for making this video. It is informative and to the point, as well as being shorter than MIT OpenCourseWare's 49 minute lecture. Subbed and definitely will be watching your other videos to learn more machine learning topics!

• SUI LEUNG Mak says:

Best of the Best ,explain so clear without complex math formula !

• Ansuman Mahapatra says:

Thanks a lot. Understood completely.

• Kenji Fnu says:

hey thales, i wonder what does the x and xi represents from region 1?

• Wathiq N says:

?what is the distance function used to find the nearest

• Junhui Park says:

Hi Thales,
Thank you for your tutorial. It was very good instruction to KNN. It may be not a good place to ask technical questions. But do you mind give a direction of the problem I am having now   I am suing KNN to distinguish the numbers. 0 – 9. And the question would be 1. How much training set will be needed normally for each number? 2. I am using the OpenCV to train the number image in the same font. Then for each training item in training set, is it ok to have same font but different szie of image to train ? how each item should be slightly different to draw the circle of one number ?
Many thanks.

• Rachayita Giri says:

For the first time, I know what KNN is all about! Thanks to you!

• Leandro Simões says:

Thanks a lot!

• Sudarshini Tyagi says:

Hey. The video is quite informative.
However, I was wondering how each training vector defines a region in space when k=1.

• Claudia Oliveira says:

Now is very simple 🙂

• Joni Hoppen says:

Congratulations for the presentation, by your accent you are Brazilian are't you? I will keep following your videos, really enjoying the material. Just as a feedback, you mentioned distances so it would be interesting to explain how to calculate the distance and the type of distance measurements can be used :). Cheers from Florianópolis.

• Johannes Clinton says:

nice sir, this is so helpfull for my research 🙂

• Joao Pedro de Carvalho Magalhaes says:

Very good explanation. Thanks for describing it in a simple way.

• ComputerPhysics Lab says:

Voronoi partition is clearly explain and visually straightforward. Thanks for sharing such a good presentation

please mr.Thales Sehn Körting l not understand how to k-nearest neighbor smoother work because knn different to kernel estimation by bandwidth .how to knn work? in nonparametric regression

• Raja Ram says:

What is this c is all about ???? Can you give some example please, is it something like the c is …X men movie .. we try to identify in which class or category this movie falls into such as romantic or Action movie ??? The romantic and action movies are already classified by the KNN model.

• Pierluigi Bizzini says:

• hombreazu1 says:

Very clear explanation. Thanks. In particular, mentioning the potential ties was very helpful.

• pierre laurent says:

Simple. Concise. Thanks for sharing.

• StxExodux says:

How do this algorithm works with missing values??

This is a question that could be in my test :/

• George Batalinski says:

1:13 – k=# (# = how many neighbours to check, before classifying)
1:29 – visually depicted how it works
3:48 – Remarks (no examples – no clear to understand)

• Angel Victoria says:

Great explaination… <thank You

• brnjnsvld says:

I studied kNN in school (multiple years ago), and worked on a team project (in school) where we were trying to predict home prices based on several numerical predictors. The problem is I don't recall how we did it. I have saved the write-up which says we used: the original listing price, # of bathrooms, is the house in zipcode A B or C, and was it built after 1980. The write-up also says we had a "K of 3" and a mean prediction error of \$16,460. Can you connect the dots for me here? How do you predict home prices using kNN? would every price point be it own classification. Regression made more sense to me for this task.

• Diego Fernando Coba Cruz says:

In your example, when you find out that 'C' element belongs to 'O' class, if you would like to clasify another element 'D' , is 'C' taken in account to predict the label for 'D' or just the original samples are used to clasify it?

Thanks.

• kunal kotiyal says:

why k is < 6?

• Epic Gamer says:

Great explanation – clear and simple – thanks much!

• armand hoxha says:

Thanks for the video

• Mike Smith says:

explained very well

good explanation with short time…😃

• not you says:

what do you do if your KNN has K+ points of equal distances?

• Abdulrhman Alomari says:

Thank you very much

• Stone says:

Clear and simple, thanks.

• Isaack Kamanga says:

very good short explanations for beginners in field of machine learning………good job

• Bobby Aipanga says:

clear and simple explanation Helped me a lot in supervised classification. thank you!

• Guillem Rico says:

Thanks, clear and perfect!.

• Bruce says:

Thank you! Great video

• shankar pratapa says:

Your videos were clear and understandable by everyone thanks for putting in your efforts on this 🙂

• Lays Veloso says:

thank you

• thasin '96 says:

Really good, Keep Doing….

• Faiz Noeris says:

can i use k=2?

• Ion Cimpeanu says:

So what about when k=1 and we have 2 classes A and B. We want to to identify a new element x in one of this classes and the distances are equal. Then the algorithm in which class puts the element x?

• Vasantha Ch says:

Thank you very much for your video.It is very help full to me.But i have a some confusion,that is when k is multiples of classes only the problem is finding the element belonging class.Is that right?

• Abdulsalam Bande says:

THANK YOU SOOOOOOOOOOO MUCH YOU 100% deserve a sub from me

• Elena Codin says:

Really useful, thank you so much!

• Gilang Akbar says:

nice video, thank you very much

• A J says:

thanks fr ur explanation

• Mahmoud Magdy says:

thanks so much for this video, Can I know what is the software that you are using for the presentation?

• Dian Pratiwi says:

😉 thank you. it helps me a lot!

• MJG says:

drink some coffee man

• Sơn Tùng Nguyễn says:

Thank you so much

• Lewis Alberg says:

the graphics make it easy to understand

• Angry Jesus says:

Aren't different metrics, like Manhattan distance, EMD, or euclid helping with the drawback of kNN algorithm? At least that's what I used in my bachelor's thesis in Facial Pattern Recognition via LBP + kNN classification (+k means)

• mohammed Khalaf says:

thank u very much. it is really useful

• Wending Peng says:

Short and good explanation, even better to speed 2x 🙂

• Munawwar Hussain Shelia says:

here a nice related post i found on the same topic http://www.mien.in/2017/09/25/implementing-knn-from-scratch-in-python/

• M Shahzaib says:

To the point, excellent video

• Juan Sebastian Wagner says:

thank you so much!

• Ahmet Cevahir ÇINAR says:

Thank you.

• Kamal Sehairi says:

Short, simple and straightforward
Amazing!!
Thanks a lot sir

• Phil:) says:

Thank you!

• Anmol Agrawal says:

thanks for the great work..could u make a video on selecting apt K Values for differemt data sets and how the complexity of calc nearest neighbours can be minimised?I'll be highly obliged.

• rafael ziotto says:

You Brasilian? I am too! thanks for the video

• Elmar Qarayev says:

how do we choose an optimum k count? is there specific method to choose k?

• Steve Ji says:

Thank you for making this video. Could you please make another video to explain how to use KNN in solving regression problem?

• Ffzc Dbnc says:

Possible image knn code in Java

• Jona says:

Thank you very much!

• Marco Serena says:

Thank you very much for your explanation

• S Schiavo says:

• Aayush Saini says:

Thank you, sir!! 🙂

• Matheus Storck says:

Great explanation! Thanks a lot for it 🙂

• Changyu Lin says:

2X speed
is good

• Shrutika Sinha says:

Thank You! You made it so easy. I now use your explanation to teach others.

• Ke MO says:

I do not work like that buddy

• Anwesa Roy says:

100

Great work. Keep it up. I have been trying to learn and write about kNN. Would be great if you can check it out and comment. In the post, I built a model and iterated it until i found an optimal test accuracy for prediction: https://the-ml-blogs.blogspot.com/2018/12/4-deeper-into-knn.html

• Ranjitha Nayak says:

Great video! And I love your accent!

• namit saxena says:

As simple as that? I was studying KNN from Introduction To Statistical Learning and I felt it so much tough to understand.

I am trying to learn Data Science, I have completed like 4 courses of Datacamp's path to Data Scientist With Python. I am feeling like the path is less concept oriented and more based on how to use python for data science.

I am totally confused with how to learn Data Science, I have 20 semester end holidays left. I want to use the time. I have tried the book Introduction to statistical learning, the book applies the methods using R and I am already learning python(using Datacamp).

I have tried few statistics courses which were complete theory. I have tried Andrew NG and everything went above my head.

I am in search for a course that includes both, the practical as well as the theoretical part.

• Sam Chen says:

Any Singaporeans here?

• Hüseyin Eken says:

Nice video, thanks! But I do have a question: in k=1 case why do we have to do Voronoi partition? Instead can't we choose the class of the new sample by the shortest distance to representative points of the training dataset classes? I guess it is the same at the end of the day but isn't it easier to calculate a sample's shortest distances than calculating for the whole Euclidean Space?

• Fatimah Mohammed says:

Good explanation

• 简锦航 says:

soft voice！！！

• Darya says:

so, which class will be determined for 'c'?

• niosen says:

Short and simple, thanks a lot!

• Sarwar Hayatt says:

You should have explained how KNN is trained first. How the centroid approach works. Without training explained, you are testing the data point.

• 김경호 says:

Such a nice and straightforward explanation on k-NN algorithm! thanks a lot

• Ed Trujillo says:

I have never posted a comment but wanted to thank you for the great explanation!

• Fawwaz Yusran says:

Thanks but I have a question: what is meant by class?

• Grandmaster Reaper says:

DO EVERY ALGORITHM WHICH EXISTS

• Laurin says:

Great video but I have one unanswered question:

Given we have three classes A B C and k = 5. Let's assume the nearest neighbours are A A B B C, how is determined whether A or B should be returned as a result?

• Ooz Zar says:

In knn algorithm which feature extraction is used?
Bow or n-gram

• Carol Tamez says:

How do you decide the number of k?

• theeachuisge says:

how does this work irL?

• Valentine Oragbakosi says:

Nice tutorial. Thanks