Python:Interpolation

From PrattWiki
Jump to navigation Jump to search

Introduction

Interpolation is a process by which "gaps" in a data set may be filled using relatively simple equations. Interpolation differs from fitting in that:

  • Interpolations are required to exactly hit all the data points, whereas fits may not hit any of the data points, and
  • Interpolations are based on, often simple, mathematical formulas without regard to the underlying system which produced the data.

Basic Types of 1-D Interpolation

There are several basic types of interpolation; the examples below are based on the following data set:

$$\begin{array}{c|c} \mbox{Time}~t, \mbox{s} & \mbox{Temperature}~T, ^o\mbox{C}\\ \hline 0 & 5 \\ 2 & 3 \\ 8 & 10 \\ 12 & 15 \\ 20 & 14 \end{array}$$

Nearest Neighbor

Nearest neighbor interpolation means that for any given input, the output will be based on the dependent value in the data set obtained at the independent value of the data set closest to the input. For example, in the data set above, $$f(4)$$ would give a temperature of 3 since time 4 is closest to time 2 in the data set. Similarly, $$f(11)$$ would return a temperature if 15 since time 11 is closest to time 12.

There are several advantages of nearest neighbor:

  • Very simple "calculation" - really, there is no calculation other than finding out which independent value is closest
  • The interpolated values are always values in the data set - if you have some system that is only capable of producing particular values, nearest neighbor interpolation will never return an impossible value.

There are also disadvantages:

  • Technically undetermined half-way between measured data points,
  • Potentially large discontinuities between data points, and
  • No potential to estimate any kind of rate information between points.

Linear Interpolation

Linear interpolation involves figuring out the equation of a straight line between data points. The output will be based on the line connecting the points to the left and right of the input. For example, in the data set above, $$f(4)$$ would be found by finding the equation of the line between (2, 3) and (8, 10). The general formula for finding the value of $$f(x)$$ based on some value $$x$$ between the data points to $$x$$'s left $$x_L$$ (where $$y=y_L$$) and the data point to $$x$$'s right $$x_R$$ (where $$y=y_R$$) is:

$$\begin{align*} f(x)&=y_L+\frac{y_R-y_L}{x_R-x_L}(x-x_L) \end{align*}$$

so in this case, the process for finding $$f(4)$$ would be:

$$\begin{align*} f(4)&=3+\frac{10-3}{8-2}(4-2)=5.33 \end{align*}$$

Note that there will be a different equation for each interval; if you were to want to calculate $$f(16)$$, you would need to use the data points at (12, 15) and (20, 14) to get:

$$\begin{align*} f(x)&=y_L+\frac{y_R-y_L}{x_R-x_L}(x-x_L)\\ f(16)&=15+\frac{14-15}{20-12}(16-12)=14.5 \end{align*}$$

Cubic Spline Interpolation

Cubic spline interpolation involves coming up with a third-order equation for each interval between the data points. Because there are now four free coefficients for each equation, cubic spline interpolations can not only satisfy the requirement that the interpolation functions hit each of the data points but also satisfy additional requirements. In the case of cubic splines, two additional requirements are to have the slopes of the interpolating functions match at the interior data points and to have the curvatures of the interpolating functions match at the interior points. It turns out that these additional restrictions are not sufficient to completely determine all the coefficients for all the equations for all the intervals -- there are actually two too few restrictions. Given that, there are some different options for how to deal with the additional two equations:

  • Not-a-knot: this is the default option; basically, the calculations look at what would have happened if the second and penultimate data points didn't exist.
  • Clamped: this option allows you to specify the first derivative at the start and the first derivative at the end. In Python, the default case for clamped end conditions is to set these slopes to zeros.
  • Natural: this option allows you to specify the second derivative at the start and the second derivative at the end. In Python, the default case for natural end conditions is to set the second derivatives to zero.

Important Notes

  • Be sure to import the CubicSpline function with
from scipy.interpolate import CubicSpline
  • Note that the result of the CubicSpline function is a function; that is:
import numpy as np
from scipy.interpolate import CubicSpline
x = np.array([1, 2, 3, 4])
y = np.array([3, 1, 4, 1])
my_fun = CubicSpline(x, y, bc_type='natural')
my_fun([1.5, 2.5, 3.5])
will create a function to calculate interpolated values and then uses it to create a list of three estimates.

Example Code

In Python, interpolation can be performed using the interp1d method of the scipy.interpolate package. This method will create an interpolation function based on the independent data, the dependent data, and the kind of interpolation you want with options inluding nearest, linear, and cubic (which uses not-a-knot conditions). Alternately, if you want to do some form of cubic spline, especially some form that is not not-a-knot, you can use the CubicSpline method of the scipy.interpolate package.

2-D Interpolation

Multidimensional interpolation - filling in the gaps when there is more than one independent variable - is also possible. One major consideration is whether the independent data set represents a grid / matrix / etc or if the measurements are irregularly spaced. At the moment, this particular section of this page only deals with the specific situation of having two independent variables and one dependent variable, and furthermore the situation where the independent values are on a rectangular grid.

For that specific situation, you can use the Scipy function interp2d. This function has a very specific way of receiving parameters and returning values. Please note: As of 4/15/2020, part of the documentation is incorrect as far as how parameters can be formatted.

  • The independent information (x and y here) must be entered as a one dimensional list or array of the column coordinates and row coordinates (specifically the way shown in the first example or as arrays with the same structure).
  • The dependent information (z here) must be entered as a two dimensional list of lists or array (specifically the way shown in the first example or as arrays with the same structure).
  • Entering x or y as two dimensional items or z as a one dimensional item causes miscalculations!

The interp2 will return a function that can then be used to calculate interpolations.

  • The returned function will only accept one dimensional ints, floats, lists, or arrays as inputs.
  • If you give the function two lists or arrays, the function will return the flattened list or array representing the interpolations for every combination of entries in the parameter lists or arrays. This makes generating interpolating surfaces much more complicated, but it is what it is.
  • The trinket below gives an example based on data provided in Table P15.9 on page 400 of Steven Chapra's "Applied Numerical Methods with MATLAB"