Regression analysis is the basis for machine learning. Essentially, given a set of points, we find the best fit algebraic formula for a smooth line or curve going through the points. A smooth curve is used to predict whether or not a new data point falls on the line.

The first step is to install the sklearn module using the same methods we used in Chapter 12 for installing matplotlib. This chapter requires that sklearn and matplotlib are installed on PyCharm.

```
import numpy as np
```

from sklearn.linear_model import LinearRegression

The input values consist of x and y values. The input x is the independent variable. The y is the output variable. The x input data is given in a single dimensional array. We must convert the x single dimension array into a two dimensional array for the formulas that follow. We do that with .reshape((-1, 1)), where 1 is the number of columns, and -1 indicates that we use as many rows as necessary. See the example code below. The y array remains one dimensional.

`import numpy as np`

x = np.array([4, 14, 26, 35, 44, 55]).reshape((-1, 1))

y = np.array([6, 19, 14, 32, 21, 37])

print(x)

print(y)

# Output

# [[ 4]

# [14]

# [26]

# [35]

# [44]

# [55]]

# [ 6 19 14 32 21 37]

# x has shape (6, 1)

# y has shape (6,)

Next we create a linear regression model. We fit it using the existing data. The estimated regression function has the equation y = b0 + b1*x1 . The goal is to find the weights b0 and b1, with the minimum SSR. SSR is the sum of the squares of the differences between y and mean of y.

```
model = LinearRegression().fit(x,y)
```

We can now use the program to find the weights b0 and b1. b0 is found with .intercept_ . b1 is found with .coef_ .

```
print('b0 = ', model.intercept_)
```

print('b1 = ', model.coef_)

We can also use the model to predict a response. In other words, given xnew, we can find ynew. We will let xnew = 17. 17 must be reshaped into 2D array form.

`xnew = np.array([17]).reshape((-1,1))`

print(‘ynew = ‘, mode.predict(xnew)

Run the combined Python program below.

`import numpy as np`

from sklearn.linear_model import LinearRegression

x = np.array([4, 14, 26, 35, 44, 55]).reshape((-1, 1))

y = np.array([6, 19, 14, 32, 21, 37])

model = LinearRegression().fit(x,y)

print('b0 = ', model.intercept_)

print('b1 = ', model.coef_)

xnew = np.array([17]).reshape((-1,1))

ynew = model.predict(xnew)

print('ynew = ', ynew)

# Output

# b0 = 6.512267657992567

# b1 = [0.50520446]

# ynew = [15.10074349]

We will now add the following lines of code to produce a plot.

`import matplotlib.pyplot as plt`

b0 = model.intercept_

b1 = model.coef_

plt.scatter(x,y,marker='o') #Plots the points

plt.plot(x,b0 + b1*x) #Plots the Linear Regression eq.

The completed program follows:

`import numpy as np`

from sklearn.linear_model import LinearRegression

import matplotlib.pyplot as plt

x = np.array([4, 14, 26, 35, 44, 55]).reshape((-1, 1))

y = np.array([6, 19, 14, 32, 21, 37])

model = LinearRegression().fit(x,y)

print('b0 = ', model.intercept_)

print('b1 = ', model.coef_)

xnew = np.array([17]).reshape((-1,1))

ynew = model.predict(xnew)

print('ynew = ', ynew)

b0 = model.intercept_

b1 = model.coef_

plt.scatter(x,y,marker='o')

plt.plot(x,b0 + b1*x)

plt.show()

As you can see the output is a graph with the scattered input points, and the graph of the linear regression. (Sometimes plots need a little help to print. Use Run->Debug.)

SALARSEN.COM

Table of Contents

Ch1-Install Python

Ch2-Install PyCharm

Ch3-Save Work

Ch4-Add Project

Ch5-Variables

Ch6-Print&Input

Ch7-Lists

Ch8-Loops

Ch9-If&Logical

Ch10-Functions

Ch11-Bubble Sort

Ch12-Plotting

Ch13-Files

Ch14-Print Format

Ch15-Dict&Comp&Zip

Ch16-Arrays

Ch17-Electrical

Ch18-Regression

Ch19-Differential

Ch20-Secant