Python:Interpolation

From PrattWiki
Revision as of 20:32, 12 November 2019 by DukeEgr93 (talk | contribs) (Linear Interpolation)
Jump to navigation Jump to search

Introduction

Interpolation is a process by which "gaps" in a data set may be filled using relatively simple equations. Interpolation differs from fitting in that:

  • Interpolations are required to exactly hit all the data points, whereas fits may not hit any of the data points, and
  • Interpolations are based on, often simple, mathematical formulas without regard to the underlying system which produced the data.

Basic Types of Interpolation

There are several basic types of interpolation; the examples below are based on the following data set:

$$\begin{array}{c|c} \mbox{Time}~t, \mbox{s} & \mbox{Temperature}~T, ^o\mbox{C}\\ \hline 0 & 5 \\ 2 & 3 \\ 8 & 10 \\ 12 & 15 \\ 20 & 14 \end{array}$$

Nearest Neighbor

Nearest neighbor interpolation means that for any given input, the output will be based on the dependent value in the data set obtained at the independent value of the data set closest to the input. For example, in the data set above, $$f(4)$$ would give a temperature of 3 since time 4 is closest to time 2 in the data set. Similarly, $$f(11)$$ would return a temperature if 15 since time 11 is closest to time 12.

There are several advantages of nearest neighbor:

  • Very simple "calculation" - really, there is no calculation other than finding out which independent value is closest
  • The interpolated values are always values in the data set - if you have some system that is only capable of producing particular values, nearest neighbor interpolation will never return an impossible value.

There are also disadvantages:

  • Technically undetermined half-way between measured data points,
  • Potentially large discontinuities between data points, and
  • No potential to estimate any kind of rate information between points.

Linear Interpolation

Linear interpolation involves figuring out the equation of a straight line between data points. The output will be based on the line connecting the points to the left and right of the input. For example, in the data set above, $$f(4)$$ would be found by finding the equation of the line between (2, 3) and (8, 10). The general formula for finding the value of $$f(x)$$ based on some value $$x$$ between the data points to $$x$$'s left $$x_L$$ (where $$y=y_L$$) and the data point to $$x$$'s right $$x_R$$ (where $$y=y_R$$) is:

$$\begin{align*} f(x)&=y_L+\frac{y_R-y_L}{x_R-x_L}(x-x_L) \end{align*}$$

so in this case, the process for finding $$f(4)$$ would be:

$$\begin{align*} f(4)&=3+\frac{10-3}{8-2}(4-2)=5.33 \end{align*}$$

Note that there will be a different equation for each interval; if you were to want to calculate $$f(16)$$, you would need to use the data points at (12, 15) and (20, 14) to get:

$$\begin{align*} f(x)&=y_L+\frac{y_R-y_L}{x_R-x_L}(x-x_L)\\ f(16)&=15+\frac{14-15}{20-12}(16-12)=14.5 \end{align*}$$

Example Code

In Python, interpolation can be performed using the interp1d method of the scipy.interpolate package. This method will create an interpolation function based on the independent data, the dependent data, and the kind of interpolation you want with options inluding nearest, linear, and cubic.