CHAPTER 18 - Regression Using Python



AD

18.1.0 Regression analysis

Regression analysis is the basis for machine learning. Essentially, given a set of points, we find the best fit algebraic formula for a smooth line or curve going through the points. A smooth curve is used to predict whether or not a new data point falls on the line.

The first step is to install the sklearn module using the same methods we used in Chapter 12 for installing matplotlib. This chapter requires that sklearn and matplotlib are installed on PyCharm.

import numpy as np
from sklearn.linear_model import LinearRegression

18.2.0 Input arrays

The input values consist of x and y values. The input x is the independent variable. The y is the output variable. The x input data is given in a single dimensional array. We must convert the x single dimension array into a two dimensional array for the formulas that follow. We do that with .reshape((-1, 1)), where 1 is the number of columns, and -1 indicates that we use as many rows as necessary. See the example code below. The y array remains one dimensional.

import numpy as np
x = np.array([4, 14, 26, 35, 44, 55]).reshape((-1, 1))
y = np.array([6, 19, 14, 32, 21, 37])
print(x)
print(y)

# Output
# [[ 4]
# [14]
# [26]
# [35]
# [44]
# [55]]
# [ 6 19 14 32 21 37]

# x has shape (6, 1)
# y has shape (6,)

18.3.0 Linear regression model

Next we create a linear regression model. We fit it using the existing data. The estimated regression function has the equation y = b0 + b1*x1 . The goal is to find the weights b0 and b1, with the minimum SSR. SSR is the sum of the squares of the differences between y and mean of y.

model = LinearRegression().fit(x,y)

18.4.0 Find the weights

We can now use the program to find the weights b0 and b1. b0 is found with .intercept_ . b1 is found with .coef_ .

print('b0 = ', model.intercept_)
print('b1 = ', model.coef_)

18.5.0 Predict a response

We can also use the model to predict a response. In other words, given xnew, we can find ynew. We will let xnew = 17. 17 must be reshaped into 2D array form.

xnew = np.array([17]).reshape((-1,1))
print(‘ynew = ‘, mode.predict(xnew)

18.6.0 Run the program

Run the combined Python program below.

import numpy as np
from sklearn.linear_model import LinearRegression
x = np.array([4, 14, 26, 35, 44, 55]).reshape((-1, 1))
y = np.array([6, 19, 14, 32, 21, 37])
model = LinearRegression().fit(x,y)
print('b0 = ', model.intercept_)
print('b1 = ', model.coef_)
xnew = np.array([17]).reshape((-1,1))
ynew = model.predict(xnew)
print('ynew = ', ynew)

# Output
# b0 = 6.512267657992567
# b1 = [0.50520446]
# ynew = [15.10074349]

18.7.0 Code for plot

We will now add the following lines of code to produce a plot.

import matplotlib.pyplot as plt
b0 = model.intercept_
b1 = model.coef_
plt.scatter(x,y,marker='o') #Plots the points
plt.plot(x,b0 + b1*x) #Plots the Linear Regression eq.

The completed program follows:

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
x = np.array([4, 14, 26, 35, 44, 55]).reshape((-1, 1))
y = np.array([6, 19, 14, 32, 21, 37])
model = LinearRegression().fit(x,y)
print('b0 = ', model.intercept_)
print('b1 = ', model.coef_)
xnew = np.array([17]).reshape((-1,1))
ynew = model.predict(xnew)
print('ynew = ', ynew)

b0 = model.intercept_
b1 = model.coef_
plt.scatter(x,y,marker='o')
plt.plot(x,b0 + b1*x)
plt.show()

Linear regression plot.

As you can see the output is a graph with the scattered input points, and the graph of the linear regression. (Sometimes plots need a little help to print. Use Run->Debug.)





Engineering Python

SALARSEN.COM
Table of Contents
Ch1-Install Python
Ch2-Install PyCharm
Ch3-Save Work
Ch4-Add Project
Ch5-Variables
Ch6-Print&Input
Ch7-Lists
Ch8-Loops
Ch9-If&Logical
Ch10-Functions
Ch11-Bubble Sort
Ch12-Plotting
Ch13-Files
Ch14-Print Format
Ch15-Dict&Comp&Zip
Ch16-Arrays
Ch17-Electrical
Ch18-Regression
Ch19-Differential
Ch20-Secant