Hey Guys, welcome to yet another interesting
session by Intellipaat and today we are going to look at Logistic Regression. So, we will start off by understanding what
exactly is Regression, then we will understand what is Logistic Regression and finally we will
implement the concept of logistic regression in R.
So, let’s get started!!! Let’s take this scenario, where we have
three employees: Sam, whose age is 20 and earns $50,000
Bob, who is 35yrs old and earns $75,000 and Matt, who is 50yrs old and earns $100,000
Now, I will introduce a new employee to you, whose age is 28 and ask you what is his salary. What would you do? You would look at the general trend between
the age and salary and understand that as age of the employee increases, his salary
also increases. This is nothing but regression, where you
are trying to understand how does a person’s age affect his salary based on the historical
data. Over here “salary” is the dependent variable
and “age” is the independent variable i.e. you are trying to ascertain the salary
of employee, with respect to the age. Let’s look at the second scenario:
Here, we have two students, Rachel and Ross. They appear for an exam and Rachel manages
to pass the exam, while Ross fails. Now, what if another student, Monica, takes
the same test? Would she be able to clear the exam? Well, you will again look at the data provided
to you and see that Rachel being a girl, was able to pass the exam, while Ross being a
guy, failed to clear it and on the basis of this data you’d say there is a good probability
for Monica to clear the exam as well. This again is regression, where you’re finding
out if the student has cleared the exam based on their gender and hence “result” is
the dependent variable over here and “gender” is the independent variable. So, in simple terms, regression helps you
to understand the extent of relationship between two variables. Now that we have understood what exactly is
regression, it’s time to understand logistic regression.
Logistic Regression is a regression technique, where the dependent variable is categorical
i.e we determine the probability of the observation belonging to a particular category. Let’s look at an example to understand this
better.. Over here, we are trying to determine the
probability of raining, based on independent variables such as “temperature” and “humidity”. Or in other words, we are choosing a category,
namely “yes” or “no” for the question, will it rain?
Right, now, we will head on and understand the difference between linear regression and
logistic regression. In linear regression, we fit a straight line. I.e. for a given value of “x”, there definitely
exists a “y” value which falls on the line.