Regression Algorithm - Simple Linear Regression

 

Regression Algorithm

What is Regression:

Regression is a statistical method used to understand the relationship between a dependent variable and one or more independent variables. By fitting a line or curve to the data points, regression analysis allows us to predict the value of the dependent variable based on the known values of the independent variables. This technique is widely used for forecasting and predicting outcomes, identifying trends, and determining the strength and nature of relationships between variables in various fields such as finance, economics, medicine, and social sciences.

There are two types of regression

1.     Simple Linear Regression:

Simple linear regression is a statistical method used to model the relationship between a single independent variable (predictor) and a dependent variable (outcome) by fitting a linear equation to observed data.

The linear equation has the form

 y=β0+β1x

where y is the dependent variable, x is the independent variable, β0 is the intercept, and β1 is the slope of the line. The intercept represents the expected value of y when x is zero, while the slope indicates the change in y for a one-unit change in x. This method is commonly used for prediction and forecasting, providing insights into how changes in the independent variable are associated with changes in the dependent variable.

For example,  If we want to predict the performance of student in the examination based on their study hours. The following table have two variables Hours studied i.e.’ x’ variable and Exam Score is the ‘y’ variable.

 

Hours Studied

Exam Score

1

50

2

55

3

65

4

70

5

75

6

85

7

90

 

Simple Linear Regression

We will use simple linear regression, which fits a straight line to the data. The formula for the line (regression equation) is:

y=β0+β1x

Where:

y is the dependent variable (Exam Score)

x is the independent variable (Hours Studied)

β0 is the intercept

β1 is the slope

 

In the regression following concepts are very important,

 Intercept (β0​): This is the value of the dependent variable y when the independent variable x is zero. It represents the baseline value of y before any independent variable x is considered. In practical terms, it often provides insight into the starting point or the value when the predictor has no effect.

 Slope (β1​): This indicates the rate of change in the dependent variable y for each unit increase in the independent variable x. It quantifies the direction and steepness of the relationship between x and y. A positive slope indicates that an increase in x leads to an increase in y, while a negative slope indicates the opposite.

 

Together, β0 and β1​ form the linear equation y=β0+β1x, which is used to predict y based on the value of x observed in the data. This model assumes a linear relationship between the variables and is foundational in statistical analysis for understanding and predicting outcomes based on continuous variables.

Finding the Regression Line

To find the regression line, we need to calculate the intercept (β0) and the slope (β1​). These are calculated using the following formulas:

β1=n(∑xy)−(∑x)(∑y) / n(∑x2)−(∑x)2

β0=∑y−β1(∑x)/n

Where:

  • n is the number of data points
  • ∑xy  is the sum of the product of x and y
  • ∑x  is the sum of x values
  • ∑y  is the sum of y values
  • ∑x2 is the sum of squares of x values

Calculations

  1. Sum of Hours Studied (x): 1+2+3+4+5+6+7= 28
  2. Sum of Exam Scores (y): 50+55+65+70+75+85+90=490
  3. Sum of Product of Hours and Scores (xy): (1×50)+(2×55)+(3×65)+(4×70)+(5×75)+(6×85)+(7×90)=2760
  4. Sum of Squares of Hours Studied (x^2):  12+22+32+42+52+62+72=140
  5. Number of data points (n): 7

Now, plug these values into the formulas:

β1=7(2760)−(28)(490) / 7(140)−(28)2

 = 19320−13720 / 980−784

=56001 / 96  ≈28.57 

β0  =490−(28.57×28) / 7

=490−799.96 / 7 ≈−44.28

So the regression equation is:

y=β0+β1x

y=−44.28+28.57x   

Using the Regression Formula to Predict Scores

Now you can use this formula to predict a student's exam score based on hours studied. For example, if a student studies for 5 hours:

y=44.28+28.57×5  

y= −44.28+142.85

y=98.57

 

Machine Learning Task: 

Download Score Data set from here (Score)

Simple Linear Regression Jupyter Notebook


टिप्पणी पोस्ट करा

0 टिप्पण्या