In this post, we are going to have a look at a program written in Python3 using NumPy as our data processing library to see how a (batch) linear regression using the gradient descent method is implemented

I will explain the workings of the code part by part, how every part of the code works. In the end, I will attach the link to the whole code hosted on GitHub along with the dataset used in the example.

Here x(i) vector is one data point with N being the size of the data set. n(eta) is our learning rate. y(i) vector is the target output. f(x) vector is the linear function of the regression defined as f(x) = Sum(wx), here sum is the sigma function. Also, we are going to consider the initial bias w0 = 0 and intercept x0 = 1. All weights are initialized as 0.
In this implementation, we are using the Sum of Squared Errors as the error calculation function.

Instead of minimizing the SSE to zero, we are going to measure the change in SSE at every iteration and compare that to a threshold which is provided before the program is executed. If the change in SSE goes below the threshold the program exits.
In the program, we are providing three inputs from the command line. They are:
1.threshold — The threshold, that the change in error has to fall below before the algorithm terminates.
2.data — The location of the data file.
3.learningRate — The learning rate of the gradient descent approach.
Therefore, the program should be able to start like this:
python3 linearregr.py — data random.csv — learningRate 0.0001 — threshold 0.0001
One last thing before we dive into the code, the output of our program will look like this:


The program consists of 6 parts and we are going to have a look at them one at a time.
The import statements
import argparse # to read inputs from command line
import csv # to read the input data set file
import numpy as np # to work with the data set

The code execution initializer block

parser = argparse.ArgumentParser()
parser.add_argument("-d", "--data", help="Data File")
parser.add_argument("-l", "--learningRate", help="Learning Rate")    
parser.add_argument("-t","--threshold", help="Threshold")    

The main() function
def main():
args = parser.parse_args()
file, learningRate, threshold = args.data, float(
args.learningRate), float(args.threshold) # save respective command line inputs into variables

# read csv file and the last column is the target output and is separated from the input (X) as Y
with open(file) as csvFile:
    reader = csv.reader(csvFile, delimiter=',')
    X = []
    Y = []
    for row in reader:
        X.append([1.0] + row[:-1])

# Convert data points into float and initialise weight vector with 0s.
n = len(X)
X = np.array(X).astype(float)
Y = np.array(Y).astype(float)
W = np.zeros(X.shape[1]).astype(float)
# this matrix is transposed to match the necessary matrix dimensions for calculating dot product
W = W.reshape(X.shape[1], 1).round(4)

# Calculate the predicted output value
f_x = calculatePredicatedValue(X, W)

# Calculate the initial SSE
sse_old = calculateSSE(Y, f_x)

outputFile = 'solution_' + 
             'learningRate_' + str(learningRate) + '_threshold_' 
             + str(threshold) + '.csv'
with open(outputFile, 'w', newline='') as csvFile:
    writer = csv.writer(csvFile, delimiter=',', quoting=csv.QUOTE_NONE, escapechar='')
    writer.writerow([*[0], *["{0:.4f}".format(val) for val in W.T[0]], *["{0:.4f}".format(sse_old)]])

    gradient, W = calculateGradient(W, X, Y, f_x, learningRate)

    iteration = 1
    while True:
        f_x = calculatePredicatedValue(X, W)
        sse_new = calculateSSE(Y, f_x)

        if abs(sse_new - sse_old) > threshold:
            writer.writerow([*[iteration], *["{0:.4f}".format(val) for val in W.T[0]], *["{0:.4f}".format(sse_new)]])
            gradient, W = calculateGradient(W, X, Y, f_x, learningRate)
            iteration += 1
            sse_old = sse_new
    writer.writerow([*[iteration], *["{0:.4f}".format(val) for val in W.T[0]], *["{0:.4f}".format(sse_new)]])
print("Output File Name: " + outputFile)

The flow of the main() function is like this:
1.Save respective command line inputs into variables
2.Read CSV file and the last column is the target output and is separated from the input(stored as X) and stored as Y
3.Convert data points into float and initialize weight vector with 0s
4.Calculate the predicted output value using the calculatePredicatedValue function
5.Calculate the initial SSE using the calculateSSE function
6.The output file is opened in writing mode and the data is written in the format mentioned in the post. After the first values are written, the gradient and updated weights are calculated using the calculateGradient function. An iteration variable is maintained to keep track of the number of times the batch linear regression is executed before it falls below the threshold value. In the infinite while loop, the predicted output value is calculated again and the new SSE value is calculated. If the absolute difference between the older(SSE from the previous iteration) and newer(SSE from current iteration) SSE is greater than the threshold value, then the above process is repeated. The iteration is incremented by 1 and the current SSE is stored into previous SSE. If the absolute difference between the older(SSE from the previous iteration) and newer(SSE from current iteration) SSE falls below the threshold value, the loop breaks and the last output values are written to the file
The calculatePredicatedValue() function
Here the predicted output is calculated by performing dot product of input matrix X and weight matrix W.

dot product of X(input) and W(weights) as numpy matrices and returning the result which is the predicted output

def calculatePredicatedValue(X, W):
f_x = np.dot(X, W)
return f_x

The calculateSSE() function
The SSE is calculated using the formula mentioned above.
def calculateSSE(Y, f_x):
sse = np.sum(np.square(f_x - Y))
return sse

Now, that the whole code is out there. Let’s have a look at the execution of the program.

Here is how the output looks like:

The final program

import argparse
import csv
import numpy as np

def main():
args = parser.parse_args()
file, learningRate, threshold = args.data, float(
args.learningRate), float(args.threshold) # save respective command line inputs into variables

# read csv file and the last column is the target output and is separated from the input (X) as Y
with open(file) as csvFile:
    reader = csv.reader(csvFile, delimiter=',')
    X = []
    Y = []
    for row in reader:
        X.append([1.0] + row[:-1])

# Convert data points into float and initialise weight vector with 0s.
n = len(X)
X = np.array(X).astype(float)
Y = np.array(Y).astype(float)
W = np.zeros(X.shape[1]).astype(float)
# this matrix is transposed to match the necessary matrix dimensions for calculating dot product
W = W.reshape(X.shape[1], 1).round(4)

# Calculate the predicted output value
f_x = calculatePredicatedValue(X, W)

# Calculate the initial SSE
sse_old = calculateSSE(Y, f_x)

outputFile = 'solution_' + 
             'learningRate_' + str(learningRate) + '_threshold_' 
             + str(threshold) + '.csv'
with open(outputFile, 'w', newline='') as csvFile:
    writer = csv.writer(csvFile, delimiter=',', quoting=csv.QUOTE_NONE, escapechar='')
    writer.writerow([*[0], *["{0:.4f}".format(val) for val in W.T[0]], *["{0:.4f}".format(sse_old)]])

    gradient, W = calculateGradient(W, X, Y, f_x, learningRate)

    iteration = 1
    while True:
        f_x = calculatePredicatedValue(X, W)
        sse_new = calculateSSE(Y, f_x)

        if abs(sse_new - sse_old) > threshold:
            writer.writerow([*[iteration], *["{0:.4f}".format(val) for val in W.T[0]], *["{0:.4f}".format(sse_new)]])
            gradient, W = calculateGradient(W, X, Y, f_x, learningRate)
            iteration += 1
            sse_old = sse_new
    writer.writerow([*[iteration], *["{0:.4f}".format(val) for val in W.T[0]], *["{0:.4f}".format(sse_new)]])
print("Output File Name: " + outputFile

def calculateGradient(W, X, Y, f_x, learningRate):
gradient = (Y - f_x) * X
gradient = np.sum(gradient, axis=0)
# gradient = np.array([float("{0:.4f}".format(val)) for val in gradient])
temp = np.array(learningRate * gradient).reshape(W.shape)
W = W + temp
return gradient, W

def calculateSSE(Y, f_x):
sse = np.sum(np.square(f_x - Y))

return sse

def calculatePredicatedValue(X, W):
f_x = np.dot(X, W)
return f_x

if name == ‘main’:
parser = argparse.ArgumentParser()
parser.add_argument("-d", “–data”, help=“Data File”)
parser.add_argument("-l", “–learningRate”, help=“Learning Rate”)
parser.add_argument("-t", “–threshold”, help=“Threshold”)

This post walks through the mathematical concepts involved in batch linear regression using gradient descent. Here, the error function (in this case Sum of Squared Errors) is taken into account. Instead of minimizing the SSE, which may not be possible always (there needs to be tuning for the learning rate) we saw how to make your linear regression converge with the help of a threshold value.

This program used numpy for processing the data but it can be done with basics of python without using numpy but it will require nested looping and hence the complexity will increase to O(nn). Anyhow, the arrays and matrices provided by numpy are more memory efficient. Also, if you are comfortable working with pandas you are encouraged to use that and try to implement the same program with it.


