Python:Fitting
This document contains examples of polynomial fitting, general linear regression, and nonlinear regression. In each section, there will be example code that may come in useful for later courses. The example code is based on the existence of a file in the same directory called Cantilever.dat
that contains two columns of data - the first is an amount of mass (in kg) placed at the end of a beam and the second is a displacement, measured in inches, at the end of the beam. For EGR 103, this file is:
0.000000 0.005211
0.113510002 0.158707
0.227279999 0.31399
0.340790009 0.474619
0.455809998 0.636769
0.569320007 0.77989
0.683630005 0.936634
0.797140015 0.999986
Contents
Common Command Reference
All links below to NumPy v1.15 manual at NumPy v1.15 Manual; these commands show up in just about all the examples:
In the scripts below, common code has a regular background while code that differs from script to script will be highlighted in yellow.
Common Code
The processes for loading and manipulating the original data, calculating the statistics, and creating the graph are the same across the examples. To simplify codes, those processes should be contained in a separate file called fitting_common.py
as given below:
import numpy as np
import matplotlib.pyplot as plt
def get_data():
# Load data from Cantilever.dat
beam_data = np.loadtxt('Cantilever.dat')
# Copy data from each column into new variables
mass = beam_data[:, 0].copy()
disp = beam_data[:, 1].copy()
# Convert mass to force
force = mass * 9.81
# Convert disp to meters
disp = (disp * 2.54) / 100
fmodel = np.linspace(np.min(force), np.max(force), 100)
# Return values
return force, disp, fmodel
def calc_stats(y, yhat, to_print=1):
# Calculate statistics
st = np.sum((y - np.mean(y))**2)
sr = np.sum((y - yhat)**2)
r2 = (st - sr) / st
if to_print:
print('st: {}\nsr: {}\nr2: {}'.format(st, sr, r2))
return st, sr, r2
def make_plot(x, y, yhat, xmodel, ymodel):
plt.figure(1).clf()
fig, ax = plt.subplots(num=1)
ax.plot(x, y, 'ko', label='Data')
ax.plot(x, yhat, 'ks', label='Estimates', mfc='none')
ax.plot(xmodel, ymodel, 'k-', label='Model')
ax.grid(True)
ax.legend()
fig.tight_layout()
Polynomial Fitting
Polynomial fits are those where the dependent data is related to some set of integer powers of the independent variable. MATLAB's built-in polyfit
command can determine the coefficients of a polynomial fit.
Specific Command References
All links below to NumPy v1.15 manual at NumPy v1.15 Manual
Example Code
In the example code below, n
determines the order of the fit. Not much else would ever need to change.
1 # %% Import modules
2 import numpy as np
3 from fitting_common import *
4
5 # %% Load and manipulate data
6 x, y, xmodel = get_data()
7
8 # %% Perform calculations
9 n = 1
10 p = np.polyfit(x, y, n)
11 print(p)
12
13 # %% Generate estimates and model
14 yhat = np.polyval(p, x)
15 ymodel = np.polyval(p, xmodel)
16
17 # %% Calculate statistics
18 calc_stats(y, yhat, 1)
19
20 # %% Generate plots
21 make_plot(x, y, yhat, xmodel, ymodel)
General Linear Regression
General linear regression involves finding some set of coefficients for fits that can be written as:
where the \(c_j\) are the coefficients of the fit and the \(\phi_j\) are the specific functions of the independent variable that make up the fit.
Specific Command References
All links below to NumPy v1.15 manual at NumPy v1.15 Manual
Example Code
In the example code below, there is an example of a general linear fits of one variable. It is solving the same fit as given above, just in different way. Specifically it uses linear algebra to find the coefficients that minimize the sum of the squares of the estimate residuals for a general linear fit. In this code, variables ending in "v" explicitly need to be column vectors while variables ending in "e" can either be 1D arrays or 2D arrays (or lists).
Note - if you get an error about "TypeError: must be real number, not NoneType" then on line 27, replace None with -1.
1 # %% Import modules
2 import numpy as np
3 from fitting_common import *
4
5 # %% Load and manipulate data
6 x, y, xmodel = get_data()
7
8 # %% Reshape data
9 xv = np.reshape(x, (-1, 1))
10 yv = np.reshape(y, (-1, 1))
11
12 # %% Perform calculations
13 def yfun(xe, coefs):
14 return coefs[0] * xe**1 + coefs[1] * xe**0
15
16 a_mat = np.block([[xv**1, xv**0]])
17 pvec = np.linalg.lstsq(a_mat, yv, rcond=None)[0]
18 print(pvec)
19
20 # %% Generate estimates and model
21 yhat = yfun(x, pvec)
22 ymodel = yfun(xmodel, pvec)
23
24 # %% Calculate statistics
25 calc_stats(y, yhat, 1)
26
27 # %% Generate plots
28 make_plot(x, y, yhat, xmodel, ymodel)
Nonlinear Regression
Nonlinear regression is both more powerful and more sensitive than linear regression. For inherently nonlinear fits, it will also produce a better \(S_r\) value than linearization since the nonlinear regression process is minimizing the \(S_r\) of the actual data rather than that of the transformed values. The sensitivity comes into play as the optimization routine may find local minima versus global minima. A good starting guess will work wonders.
Specific Command References
The link below is to the SciPy v1.1.0 reference guide at SciPy
Example Code
Note in the example code that the initial guess gives 0.6 for the slope and 0.1 for the intercept. While these numbers are quite far from the optimized values of 0.0034 for the slope and 0.00055 for the intercept, the optimization routine is still able to find the correct value. That is not always the case - try to find an initial guess close to the actual answer.
1 # %% Import modules
2 import scipy.optimize as opt
3 from fitting_common import *
4
5 # %% Load and manipulate data
6 x, y, xmodel = get_data()
7
8 # %% Perform calculations
9 def yfun(x, *coefs):
10 return coefs[0] * x**1 + coefs[1] * x**0
11
12 popt = opt.curve_fit(yfun, x, y, [0.6, 0.1])[0]
13 print(popt)
14
15 # %% Generate estimates and model
16 yhat = yfun(x, *popt)
17 ymodel = yfun(xmodel, *popt)
18
19 # %% Calculate statistics
20 calc_stats(y, yhat, 1)
21
22 # %% Generate plots
23 make_plot(x, y, yhat, xmodel, ymodel)
Example changes for different models
Polynomial
For the polynomial fitting model, really the only thing that would change would be the order of the fit and thus the value of \(n\) on line 23 of that code.
General Linear
For the general linear fit, the two places things will change will be in the function definition on line 24 and in the creation of the block matrix on line 26; the changes will mirror each other. For the straight line model - formally:
the code is
22 # %% Perform calculations
23 def yfun(xe, coefs):
24 return coefs[0] * xe**1 + coefs[1] * xe**0
25
26 a_mat = np.block([[xv**1, xv**0]])
27 pvec = np.linalg.lstsq(a_mat, yv, rcond=None)[0]
28 print(pvec)
where pvec[0][0]
will be the slope \(m\) and pvec[1][0]
will be the intercept \(b\). If the model were changed to, say,
which, formally, is
then the code would change to:
22 # %% Perform calculations
23 def yfun(xe, coefs):
24 return coefs[0] * np.cos(xe) + coefs[1] * np.sin(xe) + coefs[2] * xe**0
25
26 a_mat = np.block([[np.cos(xv), np.sin(xv), xv**0]])
27 pvec = np.linalg.lstsq(a_mat, yv, rcond=None)[0]
28 print(pvec)
and pvec
will be a 2D array with three rows.
Nonlinear
For the nonlinear fit, the two places things will change will be in the function definition on line 24 and in the initial parameter value guess on line ; there need to be as many values in the initial parameter list as there are parameters. For the straight line model - formally:
the code is
22 # %% Perform calculations
23 def yfun(x, *coefs):
24 return coefs[0] * x**1 + coefs[1] * x**0
25
26 popt = opt.curve_fit(yfun, x, y, [0.6, 0.1])[0]
27 print(popt)
where popt[0]
will be the slope \(m\) and popt[1]
will be the intercept \(b\). If the model were changed to, say,
which, formally, is
then the code would change to:
22 # %% Perform calculations
23 def yfun(x, *coefs):
24 return coefs[0] * np.cos(coefs[1]*x + coefs[2]) + coefs[3] * x**0
25
26 popt = opt.curve_fit(yfun, x, y, [.01, 1, np.pi/4, .0125])[0]
27 print(popt)
where [.01, 1, np.pi/4, .0125]
represents initial guesses for the magnitude a, frequency b, phase c, and average value of the function. The popt
variable would have 4 values.