Difference between revisions of "Statistics Symbols"

From PrattWiki
Jump to navigation Jump to search
m
(Symbols)
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
This page is specifically for people in EGR 103 and represents a concordance of sorts among the lectures and the two textbooks with respect to different symbols for statistical quantities.
+
This page is specifically for people in EGR 103 and represents a concordance of sorts among the lectures and the two textbooks with respect to different symbols for statistical quantities. It has been updated to reflect the use of Python in EGR 103. 
 +
 
 +
In general, the index $$k$$ is used to mean the $$k$$th data point (out of $$N$$ data points, indexed 0 through $$N-1$$) and the index $$m$$ is used as the $$m$$th basis function of a general linear fit (out of $$M$$ basis functions, indexed 0 through $$M-1$$). The letter $$a$$ is used to represent the coefficient of a basis function and $$\phi(x)$$ is the basis function.  That is to say, for a general linear fit with $$M$$ basis functions, the estimate of the $$k$$th dependent value is:
 +
<center>
 +
$$
 +
\begin{align*}
 +
\hat{y}(x_k)&=\sum_{m=0}^{M-1}a_m\phi_m(x_k)
 +
\end{align*}
 +
$$
 +
</center>
  
 
== Symbols ==
 
== Symbols ==
 
The entries in the "Palm" column are taken from William J. Palm III's '''Introduction to Matlab 7 for Engineers, 2/e'''<ref name="Palm">[http://highered.mcgraw-hill.com/sites/0072548185/ Introduction to Matlab 7 for Engineers, 2/e], William Palm III</ref> book,  
 
The entries in the "Palm" column are taken from William J. Palm III's '''Introduction to Matlab 7 for Engineers, 2/e'''<ref name="Palm">[http://highered.mcgraw-hill.com/sites/0072548185/ Introduction to Matlab 7 for Engineers, 2/e], William Palm III</ref> book,  
 
while those in the "Chapra" column are taken from Steven C. Chapra's '''Applied Numerical Methods with MATLAB for Engineers and Scientists, 2/e'''<ref name="Chapra">[http://highered.mcgraw-hill.com/sites/007313290x/ Applied Numerical Methods with MATLAB for Engineers and Scientists, 2/e], Steven C. Chapra</ref> book.  Entries in the "EGR 103" column, when not taken from Chapra or Palm, have been developed over the course of several years' of EGR 103 lectures.
 
while those in the "Chapra" column are taken from Steven C. Chapra's '''Applied Numerical Methods with MATLAB for Engineers and Scientists, 2/e'''<ref name="Chapra">[http://highered.mcgraw-hill.com/sites/007313290x/ Applied Numerical Methods with MATLAB for Engineers and Scientists, 2/e], Steven C. Chapra</ref> book.  Entries in the "EGR 103" column, when not taken from Chapra or Palm, have been developed over the course of several years' of EGR 103 lectures.
 +
<center>
 
<math>
 
<math>
 
\begin{array}{|c|c|c|c|}\hline
 
\begin{array}{|c|c|c|c|}\hline
Line 14: Line 24:
 
\\ \hline
 
\\ \hline
 
\mbox{Individual Elements} &
 
\mbox{Individual Elements} &
y_i & y_i & y_i
+
y_i & y_i & y_k
 
\\ \hline
 
\\ \hline
 
\mbox{Mean Value} &
 
\mbox{Mean Value} &
 
\bar{y}=\frac{1}{n}\sum_{i=1}^ny_i &
 
\bar{y}=\frac{1}{n}\sum_{i=1}^ny_i &
 
\bar{y}=\frac{\sum y_i}{n} &
 
\bar{y}=\frac{\sum y_i}{n} &
\bar{y}=\frac{\sum y_i}{n}
+
\bar{y}=\frac{\sum y_k}{N}
 
\\ \hline
 
\\ \hline
 
\mbox{Sum of Squares of Data Residuals} &
 
\mbox{Sum of Squares of Data Residuals} &
 
S=\sum_{i=1}^m\left(y_i-\bar{y}\right)^2 &
 
S=\sum_{i=1}^m\left(y_i-\bar{y}\right)^2 &
 
S_t=\sum\left(y_i-\bar{y}\right)^2 &
 
S_t=\sum\left(y_i-\bar{y}\right)^2 &
S_t = \sum\left(y_i-\bar{y}\right)^2  
+
S_t = \sum_k\left(y_k-\bar{y}\right)^2  
\\ \hline
 
\mbox{(Sample) Standard Deviation} &
 
\sigma=\sqrt{\frac{\sum_{i=1}^n(y_i-\bar{y})^2}{n-1}} &
 
s_y=\sqrt{\frac{S_t}{n-1}} &
 
s_y=\sqrt{\frac{S_t}{n-1}}
 
 
\\ \hline
 
\\ \hline
\mbox{Coefficient of Variation} &
+
\substack{\mbox{Estimates}\\\mbox{(straight line)}} &
\mbox{Not Used} &
 
\mbox{c.v.}=\frac{s_y}{\bar{y}}*100\% &
 
\mbox{c.v.}=\frac{s_y}{\bar{y}}*100\%
 
\\ \hline
 
\mbox{Estimates (Linear)} &
 
 
f(x_i) &
 
f(x_i) &
 
a_0+a_1x_i&
 
a_0+a_1x_i&
\hat{y}_i=P(1)x_i+P(2)
+
\hat{y}_k=p[0]x_k+p[1]
 
\\ \hline
 
\\ \hline
\mbox{Estimates (General)} &
+
\substack{\mbox{Estimates}\\\mbox{(general linear fit)}} &
 
f(x_i) &
 
f(x_i) &
 
\hat{y}_i=\sum_{j=0}^ma_jz_{ji} &
 
\hat{y}_i=\sum_{j=0}^ma_jz_{ji} &
\hat{y}_i=\sum_{k=1}^Na_k\phi_k(x_i)
+
\hat{y}_k=\sum_{m=0}^{M-1}a_m\phi_m(x_k)
 
\\ \hline
 
\\ \hline
\mbox{Sum of Squares of Estimate Residuals (linear fit)} &  
+
\substack{\mbox{Sum of Squares of Estimate Residuals}\\\mbox{(straight line)}} &  
 
J=\sum_{i=1}^m\left[f(x_i)-y_i\right]^2 &
 
J=\sum_{i=1}^m\left[f(x_i)-y_i\right]^2 &
 
S_r=\sum\left(y_i-a_0-a_1x_i\right)^2 &
 
S_r=\sum\left(y_i-a_0-a_1x_i\right)^2 &
S_r=\sum\left(y_i-\hat{y}_i\right)^2  
+
S_r=\sum_k\left(y_k-\hat{y}_k\right)^2  
 
\\ \hline
 
\\ \hline
\mbox{Standard Error of the Estimate (linear fit)} &
+
\substack{\mbox{Sum of Squares of Estimate Residuals}\\\mbox{(general linear fit)}} &  
 
\mbox{Not Used} &
 
\mbox{Not Used} &
s_{y/x} = \sqrt{\frac{S_r}{n-2}}&
+
S_r=\sum_{i=1}^{n}
s_{y/x} = \sqrt{\frac{S_r}{n-2}}
+
\left(y_i-\hat{y}\right)^2 &
 +
S_r=\sum_k\left(y_k-\hat{y}_k\right)^2
 +
\\ \hline
 +
\mbox{Coefficient of Determination} &
 +
r^2=1-\frac{J}{S} &
 +
r^2=\frac{S_t-S_r}{S_t} &
 +
r^2=\frac{S_t-S_r}{S_t}=1-\frac{S_r}{S_t} \\ \hline
 +
\mbox{Sample Standard Deviation} &
 +
\sigma=\sqrt{\frac{\sum_{i=1}^n(y_i-\bar{y})^2}{n-1}} &  
 +
s_y=\sqrt{\frac{S_t}{n-1}} &
 +
s_y=\sqrt{\frac{S_t}{N-1}}
 +
\\ \hline
 +
\mbox{Coefficient of Variation} &
 +
\mbox{Not Used} &
 +
\mbox{c.v.}=\frac{s_y}{\bar{y}}*100\% &
 +
\mbox{c.v.}=\frac{s_y}{\bar{y}}*100\%
 
\\ \hline
 
\\ \hline
\mbox{Sum of Squares of Estimate Residuals (general fit)} &  
+
\substack{\mbox{Standard Error of the Estimate}\\\mbox{(straight line)}} &
 
\mbox{Not Used} &
 
\mbox{Not Used} &
S_r=\sum_{i=1}^{n}
+
s_{y/x} = \sqrt{\frac{S_r}{n-2}}&
\left(y_i-\hat{y}\right)^2 &
+
s_{y/x} = \sqrt{\frac{S_r}{N-2}}
S_r=\sum\left(y_i-\hat{y}_i\right)^2  
 
 
\\ \hline
 
\\ \hline
\mbox{Standard Error of the Estimate (general fit)} &
+
\substack{\mbox{Standard Error of the Estimate}\\\mbox{(general linear fit)}} &
 
\mbox{Not Used} &
 
\mbox{Not Used} &
 
s_{y/x} = \sqrt{\frac{S_r}{n-(m+1)}}&
 
s_{y/x} = \sqrt{\frac{S_r}{n-(m+1)}}&
s_{y/x} = \sqrt{\frac{S_r}{n-N}}
+
s_{y/x} = \sqrt{\frac{S_r}{N-M}}
 
\\ \hline
 
\\ \hline
\mbox{Coefficient of Determination} &
 
r^2=1-\frac{J}{S} &
 
r^2=\frac{S_t-S_r}{S_t} &
 
r^2=\frac{S_t-S_r}{S_t} \\ \hline
 
 
\end{array}
 
\end{array}
 
</math>
 
</math>
 +
</center>
  
 +
== Code ==
 +
The following table will compare how to calculate various items in MATLAB and Python. The MATLAB version will use a row matrix:
 +
<syntaxhighlight>
 +
a = [4 8 8 2 5 4]
 +
</syntaxhighlight>
 +
while Python will use a list, a 1-D array and a data frame with the same contents:
 +
<syntaxhighlight lang=python>
 +
import numpy as np
 +
import pandas as pd
 +
alist = [4, 8, 8, 2, 5, 4]
 +
aarray = np.array(alist)
 +
aframe = pd.DataFrame(alist)
 +
</syntaxhighlight>
  
 
== Questions ==
 
== Questions ==

Latest revision as of 17:21, 18 September 2021

This page is specifically for people in EGR 103 and represents a concordance of sorts among the lectures and the two textbooks with respect to different symbols for statistical quantities. It has been updated to reflect the use of Python in EGR 103.

In general, the index $$k$$ is used to mean the $$k$$th data point (out of $$N$$ data points, indexed 0 through $$N-1$$) and the index $$m$$ is used as the $$m$$th basis function of a general linear fit (out of $$M$$ basis functions, indexed 0 through $$M-1$$). The letter $$a$$ is used to represent the coefficient of a basis function and $$\phi(x)$$ is the basis function. That is to say, for a general linear fit with $$M$$ basis functions, the estimate of the $$k$$th dependent value is:

$$ \begin{align*} \hat{y}(x_k)&=\sum_{m=0}^{M-1}a_m\phi_m(x_k) \end{align*} $$

Symbols

The entries in the "Palm" column are taken from William J. Palm III's Introduction to Matlab 7 for Engineers, 2/e[1] book, while those in the "Chapra" column are taken from Steven C. Chapra's Applied Numerical Methods with MATLAB for Engineers and Scientists, 2/e[2] book. Entries in the "EGR 103" column, when not taken from Chapra or Palm, have been developed over the course of several years' of EGR 103 lectures.

\( \begin{array}{|c|c|c|c|}\hline \mbox{Quantity} & \mbox{Palm} & \mbox{Chapra} & \mbox{EGR 103}\\ \hline \mbox{Independent Data} & x & x & x \\ \hline \mbox{Dependent Data} & y & y & y \\ \hline \mbox{Individual Elements} & y_i & y_i & y_k \\ \hline \mbox{Mean Value} & \bar{y}=\frac{1}{n}\sum_{i=1}^ny_i & \bar{y}=\frac{\sum y_i}{n} & \bar{y}=\frac{\sum y_k}{N} \\ \hline \mbox{Sum of Squares of Data Residuals} & S=\sum_{i=1}^m\left(y_i-\bar{y}\right)^2 & S_t=\sum\left(y_i-\bar{y}\right)^2 & S_t = \sum_k\left(y_k-\bar{y}\right)^2 \\ \hline \substack{\mbox{Estimates}\\\mbox{(straight line)}} & f(x_i) & a_0+a_1x_i& \hat{y}_k=p[0]x_k+p[1] \\ \hline \substack{\mbox{Estimates}\\\mbox{(general linear fit)}} & f(x_i) & \hat{y}_i=\sum_{j=0}^ma_jz_{ji} & \hat{y}_k=\sum_{m=0}^{M-1}a_m\phi_m(x_k) \\ \hline \substack{\mbox{Sum of Squares of Estimate Residuals}\\\mbox{(straight line)}} & J=\sum_{i=1}^m\left[f(x_i)-y_i\right]^2 & S_r=\sum\left(y_i-a_0-a_1x_i\right)^2 & S_r=\sum_k\left(y_k-\hat{y}_k\right)^2 \\ \hline \substack{\mbox{Sum of Squares of Estimate Residuals}\\\mbox{(general linear fit)}} & \mbox{Not Used} & S_r=\sum_{i=1}^{n} \left(y_i-\hat{y}\right)^2 & S_r=\sum_k\left(y_k-\hat{y}_k\right)^2 \\ \hline \mbox{Coefficient of Determination} & r^2=1-\frac{J}{S} & r^2=\frac{S_t-S_r}{S_t} & r^2=\frac{S_t-S_r}{S_t}=1-\frac{S_r}{S_t} \\ \hline \mbox{Sample Standard Deviation} & \sigma=\sqrt{\frac{\sum_{i=1}^n(y_i-\bar{y})^2}{n-1}} & s_y=\sqrt{\frac{S_t}{n-1}} & s_y=\sqrt{\frac{S_t}{N-1}} \\ \hline \mbox{Coefficient of Variation} & \mbox{Not Used} & \mbox{c.v.}=\frac{s_y}{\bar{y}}*100\% & \mbox{c.v.}=\frac{s_y}{\bar{y}}*100\% \\ \hline \substack{\mbox{Standard Error of the Estimate}\\\mbox{(straight line)}} & \mbox{Not Used} & s_{y/x} = \sqrt{\frac{S_r}{n-2}}& s_{y/x} = \sqrt{\frac{S_r}{N-2}} \\ \hline \substack{\mbox{Standard Error of the Estimate}\\\mbox{(general linear fit)}} & \mbox{Not Used} & s_{y/x} = \sqrt{\frac{S_r}{n-(m+1)}}& s_{y/x} = \sqrt{\frac{S_r}{N-M}} \\ \hline \end{array} \)

Code

The following table will compare how to calculate various items in MATLAB and Python. The MATLAB version will use a row matrix:

a = [4 8 8 2 5 4]

while Python will use a list, a 1-D array and a data frame with the same contents:

import numpy as np
import pandas as pd
alist = [4, 8, 8, 2, 5, 4]
aarray = np.array(alist)
aframe = pd.DataFrame(alist)

Questions

Post your questions by editing the discussion page of this article. Edit the page, then scroll to the bottom and add a question by putting in the characters *{{Q}}, followed by your question and finally your signature (with four tildes, i.e. ~~~~). Using the {{Q}} will automatically put the page in the category of pages with questions - other editors hoping to help out can then go to that category page to see where the questions are. See the page for Template:Q for details and examples.

External Links

References