Linear Regression in Machine Learning

Gaurav Raj
6 min readAug 28, 2023

--

Demonstration and explanation of Linear Regression Machine Learning training algorithm using a simple program and application of the algorithm in Python.

Linear regression is one of the most basic and widely used techniques in machine learning. It is a type of supervised learning algorithm used for predictive analysis, where we are trying to predict a continuous output variable (also known as the dependent variable) based on one or more input variables (also known as independent variables). Linear regression is called “linear” because it assumes a linear relationship between the input and output variables.

The goal of linear regression is to find the “best-fit” line that describes the relationship between the input variables and the output variable. This best-fit line is called the regression line. The regression line is a straight line that minimizes the distance between the predicted values and the actual values of the output variable. In other words, the regression line is the line that provides the most accurate predictions of the output variable given the input variables.

There are two types of linear regression: simple linear regression and multiple linear regression. Simple linear regression involves predicting the output variable based on a single input variable, while multiple linear regression involves predicting the output variable based on two or more input variables.

To find the regression line, we use a mathematical technique called least squares regression. This technique involves minimizing the sum of the squared differences between the predicted values and the actual values of the output variable. Once we find the regression line, we can use it to make predictions about the output variable for new input values.

Linear regression has a wide range of applications, including in finance, economics, biology, engineering, and many other fields. For example, in finance, linear regression can be used to predict stock prices based on factors such as market trends, company performance, and economic indicators. In biology, linear regression can be used to predict the growth of cells based on various factors such as temperature, pH level, and nutrient availability.

In conclusion, linear regression is a simple yet powerful machine-learning technique that is widely used for predictive analysis. It involves finding the best-fit line that describes the relationship between the input variables and the output variable, and it can be applied to a wide range of fields and applications.

Finding the Regression Line

The general equation for a straight line is:

y = mx + b

where:

y is the dependent variable (also known as the output variable)
x is the independent variable (also known as the input variable)
m is the slope of the line
b is the y-intercept (the point where the line intersects the y-axis)

In linear regression, we use this equation to find the regression line that best fits the data. The slope and y-intercept of the line are determined using the least squares method, which minimizes the sum of the squared differences between the predicted values and the actual values of the output variable.

Here’s an example of how this works:

Suppose we have a dataset with two variables, x, and y. We want to find the best-fit line that describes the relationship between x and y. We can start by plotting the data on a scatter plot.

# importing the graphing library
import matplotlib.pyplot as plt
# Input data
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
# Plot the data on a scatter plot graph
plt.scatter(x, y)
plt.xlabel('X')
plt.ylabel('Y')
plt.show()
Scatter plot of the available data

Here’s the complete program to plot the data on a scatter graph

# Import the graphing library
import matplotlib.pyplot as plt
# Input data
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
# Plot the data on a scatter plot graph
plt.scatter(x, y)
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

From the scatter plot, we can see that there appears to be a positive linear relationship between x and y. To find the regression line, we need to find the slope and y-intercept of the line using the least squares method.

The slope of the line (m) can be calculated using the following formula:

m = (NΣ(xy) — ΣxΣy) / (NΣ(x²) — (Σx)²)

where:

N is the total number of data points
Σxy is the sum of the products of x and y
Σx and Σy are the sums of x and y, respectively
Σ(x^2) is the sum of the squares of x

Substituting the values from our example, we get:

m = ((5 * 42) — (15 * 4)) / ((5 * 55) — (15²)) = 0.6

Next, we can find the y-intercept (b) of the line using the following formula:

b = (Σy — mΣx) / N

Substituting the values from our example, we get:

b = (20 — (0.6 * 15)) / 5 = 1

So, the equation of the regression line is:

y = 0.6x + 1

We can plot the regression line on the scatter plot to see how well it fits the data.

# Importing the library
import numpy as np
# Calculate the regression line
m = 0.16
b = 1
x_line = np.linspace(0, 6)
y_line = m * x_line + b
# Plot the data and regression line
plt.scatter(x, y)
plt.plot(x_line, y_line, color='red')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()
Scatter plot of available data on graph with the regression line

From the plot, we can see that the regression line appears to fit the data reasonably well.

Here’s the complete program

# Importing the library
import numpy as np
# Calculate the regression line
m = 0.16
b = 1
x_line = np.linspace(0, 6)
y_line = m * x_line + b
# Plot the data and regression line
plt.scatter(x, y)
plt.plot(x_line, y_line, color='red')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

We can use this line to make predictions about the value of y for new values of x.

# importing the necessary libraries
import numpy as np
from sklearn.linear_model import LinearRegression
# Input data
x = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])
# Create a linear regression model
model = LinearRegression()
# Train the model using the input data
model.fit(x, y)
# Predict the value of y for a new input value of x
new_x = np.array([[6]])
predicted_y = model.predict(new_x)
print("Predicted value of Y for X = 6: ", predicted_y)

Predicted value of Y for X = 6: [5.8]

This program first imports the necessary libraries and defines the input data as numpy arrays.

Then, it creates a LinerRegression object and fits the model using the fit method with X and Y as input parameters. This trains the model on the given input data.

Finally, it uses the predict method of the LinearRegression object to predict the value of Y is then printed to the console.

Here’s the complete machine learning program implementing Simple Linear Regression

# importing the necessary libraries
import numpy as np
from sklearn.linear_model import LinearRegression
# Input data
x = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])
# Create a linear regression model
model = LinearRegression()
# Train the model using the input data
model.fit(x, y)
# Predict the value of y for a new input value of x
new_x = np.array([[6]])
predicted_y = model.predict(new_x)
print("Predicted value of Y for X = 6: ", predicted_y)

Don’t forget to share if you liked, and if you have any questions or queries, ping me up on twitter at @thehackersbrain.

--

--

Gaurav Raj

🔐 Cybersecurity student exploring tech security. Join my journey to learn and protect the digital world together! 💻🌐