Difference between revisions of "Statistics Symbols"
(→Symbols) |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
This page is specifically for people in EGR 103 and represents a concordance of sorts among the lectures and the two textbooks with respect to different symbols for statistical quantities. It has been updated to reflect the use of Python in EGR 103. | This page is specifically for people in EGR 103 and represents a concordance of sorts among the lectures and the two textbooks with respect to different symbols for statistical quantities. It has been updated to reflect the use of Python in EGR 103. | ||
− | In general, the index $$k$$ is used to mean the $$k$$th data point (out of $$N$$ data points, indexed 0 through $$N-1$$) and the index $$m$$ is used as the $$m$$th basis function of a general linear fit (out of $$M$$ basis functions, indexed 0 through $$M-1$$). The letter $$a$$ is used to represent the coefficient of a basis function and $$\phi(x)$$ is the basis function. That is to say, for a general linear fit with $M$ basis functions, the estimate of the $$k$$th dependent value is: | + | In general, the index $$k$$ is used to mean the $$k$$th data point (out of $$N$$ data points, indexed 0 through $$N-1$$) and the index $$m$$ is used as the $$m$$th basis function of a general linear fit (out of $$M$$ basis functions, indexed 0 through $$M-1$$). The letter $$a$$ is used to represent the coefficient of a basis function and $$\phi(x)$$ is the basis function. That is to say, for a general linear fit with $$M$$ basis functions, the estimate of the $$k$$th dependent value is: |
<center> | <center> | ||
$$ | $$ | ||
Line 36: | Line 36: | ||
S_t = \sum_k\left(y_k-\bar{y}\right)^2 | S_t = \sum_k\left(y_k-\bar{y}\right)^2 | ||
\\ \hline | \\ \hline | ||
− | \mbox{Estimates (straight line)} & | + | \substack{\mbox{Estimates}\\\mbox{(straight line)}} & |
f(x_i) & | f(x_i) & | ||
a_0+a_1x_i& | a_0+a_1x_i& | ||
\hat{y}_k=p[0]x_k+p[1] | \hat{y}_k=p[0]x_k+p[1] | ||
\\ \hline | \\ \hline | ||
− | \mbox{Estimates (general linear fit)} & | + | \substack{\mbox{Estimates}\\\mbox{(general linear fit)}} & |
f(x_i) & | f(x_i) & | ||
\hat{y}_i=\sum_{j=0}^ma_jz_{ji} & | \hat{y}_i=\sum_{j=0}^ma_jz_{ji} & | ||
\hat{y}_k=\sum_{m=0}^{M-1}a_m\phi_m(x_k) | \hat{y}_k=\sum_{m=0}^{M-1}a_m\phi_m(x_k) | ||
\\ \hline | \\ \hline | ||
− | \mbox{Sum of Squares of Estimate Residuals (straight line)} & | + | \substack{\mbox{Sum of Squares of Estimate Residuals}\\\mbox{(straight line)}} & |
J=\sum_{i=1}^m\left[f(x_i)-y_i\right]^2 & | J=\sum_{i=1}^m\left[f(x_i)-y_i\right]^2 & | ||
S_r=\sum\left(y_i-a_0-a_1x_i\right)^2 & | S_r=\sum\left(y_i-a_0-a_1x_i\right)^2 & | ||
S_r=\sum_k\left(y_k-\hat{y}_k\right)^2 | S_r=\sum_k\left(y_k-\hat{y}_k\right)^2 | ||
\\ \hline | \\ \hline | ||
− | \mbox{Sum of Squares of Estimate Residuals (general linear fit)} & | + | \substack{\mbox{Sum of Squares of Estimate Residuals}\\\mbox{(general linear fit)}} & |
\mbox{Not Used} & | \mbox{Not Used} & | ||
S_r=\sum_{i=1}^{n} | S_r=\sum_{i=1}^{n} | ||
Line 61: | Line 61: | ||
r^2=\frac{S_t-S_r}{S_t} & | r^2=\frac{S_t-S_r}{S_t} & | ||
r^2=\frac{S_t-S_r}{S_t}=1-\frac{S_r}{S_t} \\ \hline | r^2=\frac{S_t-S_r}{S_t}=1-\frac{S_r}{S_t} \\ \hline | ||
− | |||
− | |||
− | |||
− | |||
\mbox{Sample Standard Deviation} & | \mbox{Sample Standard Deviation} & | ||
\sigma=\sqrt{\frac{\sum_{i=1}^n(y_i-\bar{y})^2}{n-1}} & | \sigma=\sqrt{\frac{\sum_{i=1}^n(y_i-\bar{y})^2}{n-1}} & | ||
s_y=\sqrt{\frac{S_t}{n-1}} & | s_y=\sqrt{\frac{S_t}{n-1}} & | ||
− | s_y=\sqrt{\frac{S_t}{ | + | s_y=\sqrt{\frac{S_t}{N-1}} |
\\ \hline | \\ \hline | ||
\mbox{Coefficient of Variation} & | \mbox{Coefficient of Variation} & | ||
Line 75: | Line 71: | ||
\mbox{c.v.}=\frac{s_y}{\bar{y}}*100\% | \mbox{c.v.}=\frac{s_y}{\bar{y}}*100\% | ||
\\ \hline | \\ \hline | ||
− | \mbox{Standard Error of the Estimate (straight line)} & | + | \substack{\mbox{Standard Error of the Estimate}\\\mbox{(straight line)}} & |
\mbox{Not Used} & | \mbox{Not Used} & | ||
s_{y/x} = \sqrt{\frac{S_r}{n-2}}& | s_{y/x} = \sqrt{\frac{S_r}{n-2}}& | ||
− | s_{y/x} = \sqrt{\frac{S_r}{ | + | s_{y/x} = \sqrt{\frac{S_r}{N-2}} |
\\ \hline | \\ \hline | ||
− | \mbox{Standard Error of the Estimate (general linear fit)} & | + | \substack{\mbox{Standard Error of the Estimate}\\\mbox{(general linear fit)}} & |
\mbox{Not Used} & | \mbox{Not Used} & | ||
s_{y/x} = \sqrt{\frac{S_r}{n-(m+1)}}& | s_{y/x} = \sqrt{\frac{S_r}{n-(m+1)}}& | ||
− | s_{y/x} = \sqrt{\frac{S_r}{ | + | s_{y/x} = \sqrt{\frac{S_r}{N-M}} |
\\ \hline | \\ \hline | ||
− | - | + | \end{array} |
+ | </math> | ||
+ | </center> | ||
+ | |||
+ | == Code == | ||
+ | The following table will compare how to calculate various items in MATLAB and Python. The MATLAB version will use a row matrix: | ||
+ | <syntaxhighlight> | ||
+ | a = [4 8 8 2 5 4] | ||
+ | </syntaxhighlight> | ||
+ | while Python will use a list, a 1-D array and a data frame with the same contents: | ||
+ | <syntaxhighlight lang=python> | ||
+ | import numpy as np | ||
+ | import pandas as pd | ||
+ | alist = [4, 8, 8, 2, 5, 4] | ||
+ | aarray = np.array(alist) | ||
+ | aframe = pd.DataFrame(alist) | ||
+ | </syntaxhighlight> | ||
== Questions == | == Questions == |
Latest revision as of 17:21, 18 September 2021
This page is specifically for people in EGR 103 and represents a concordance of sorts among the lectures and the two textbooks with respect to different symbols for statistical quantities. It has been updated to reflect the use of Python in EGR 103.
In general, the index $$k$$ is used to mean the $$k$$th data point (out of $$N$$ data points, indexed 0 through $$N-1$$) and the index $$m$$ is used as the $$m$$th basis function of a general linear fit (out of $$M$$ basis functions, indexed 0 through $$M-1$$). The letter $$a$$ is used to represent the coefficient of a basis function and $$\phi(x)$$ is the basis function. That is to say, for a general linear fit with $$M$$ basis functions, the estimate of the $$k$$th dependent value is:
$$ \begin{align*} \hat{y}(x_k)&=\sum_{m=0}^{M-1}a_m\phi_m(x_k) \end{align*} $$
Symbols
The entries in the "Palm" column are taken from William J. Palm III's Introduction to Matlab 7 for Engineers, 2/e[1] book, while those in the "Chapra" column are taken from Steven C. Chapra's Applied Numerical Methods with MATLAB for Engineers and Scientists, 2/e[2] book. Entries in the "EGR 103" column, when not taken from Chapra or Palm, have been developed over the course of several years' of EGR 103 lectures.
\( \begin{array}{|c|c|c|c|}\hline \mbox{Quantity} & \mbox{Palm} & \mbox{Chapra} & \mbox{EGR 103}\\ \hline \mbox{Independent Data} & x & x & x \\ \hline \mbox{Dependent Data} & y & y & y \\ \hline \mbox{Individual Elements} & y_i & y_i & y_k \\ \hline \mbox{Mean Value} & \bar{y}=\frac{1}{n}\sum_{i=1}^ny_i & \bar{y}=\frac{\sum y_i}{n} & \bar{y}=\frac{\sum y_k}{N} \\ \hline \mbox{Sum of Squares of Data Residuals} & S=\sum_{i=1}^m\left(y_i-\bar{y}\right)^2 & S_t=\sum\left(y_i-\bar{y}\right)^2 & S_t = \sum_k\left(y_k-\bar{y}\right)^2 \\ \hline \substack{\mbox{Estimates}\\\mbox{(straight line)}} & f(x_i) & a_0+a_1x_i& \hat{y}_k=p[0]x_k+p[1] \\ \hline \substack{\mbox{Estimates}\\\mbox{(general linear fit)}} & f(x_i) & \hat{y}_i=\sum_{j=0}^ma_jz_{ji} & \hat{y}_k=\sum_{m=0}^{M-1}a_m\phi_m(x_k) \\ \hline \substack{\mbox{Sum of Squares of Estimate Residuals}\\\mbox{(straight line)}} & J=\sum_{i=1}^m\left[f(x_i)-y_i\right]^2 & S_r=\sum\left(y_i-a_0-a_1x_i\right)^2 & S_r=\sum_k\left(y_k-\hat{y}_k\right)^2 \\ \hline \substack{\mbox{Sum of Squares of Estimate Residuals}\\\mbox{(general linear fit)}} & \mbox{Not Used} & S_r=\sum_{i=1}^{n} \left(y_i-\hat{y}\right)^2 & S_r=\sum_k\left(y_k-\hat{y}_k\right)^2 \\ \hline \mbox{Coefficient of Determination} & r^2=1-\frac{J}{S} & r^2=\frac{S_t-S_r}{S_t} & r^2=\frac{S_t-S_r}{S_t}=1-\frac{S_r}{S_t} \\ \hline \mbox{Sample Standard Deviation} & \sigma=\sqrt{\frac{\sum_{i=1}^n(y_i-\bar{y})^2}{n-1}} & s_y=\sqrt{\frac{S_t}{n-1}} & s_y=\sqrt{\frac{S_t}{N-1}} \\ \hline \mbox{Coefficient of Variation} & \mbox{Not Used} & \mbox{c.v.}=\frac{s_y}{\bar{y}}*100\% & \mbox{c.v.}=\frac{s_y}{\bar{y}}*100\% \\ \hline \substack{\mbox{Standard Error of the Estimate}\\\mbox{(straight line)}} & \mbox{Not Used} & s_{y/x} = \sqrt{\frac{S_r}{n-2}}& s_{y/x} = \sqrt{\frac{S_r}{N-2}} \\ \hline \substack{\mbox{Standard Error of the Estimate}\\\mbox{(general linear fit)}} & \mbox{Not Used} & s_{y/x} = \sqrt{\frac{S_r}{n-(m+1)}}& s_{y/x} = \sqrt{\frac{S_r}{N-M}} \\ \hline \end{array} \)
Code
The following table will compare how to calculate various items in MATLAB and Python. The MATLAB version will use a row matrix:
a = [4 8 8 2 5 4]
while Python will use a list, a 1-D array and a data frame with the same contents:
import numpy as np
import pandas as pd
alist = [4, 8, 8, 2, 5, 4]
aarray = np.array(alist)
aframe = pd.DataFrame(alist)
Questions
Post your questions by editing the discussion page of this article. Edit the page, then scroll to the bottom and add a question by putting in the characters *{{Q}}, followed by your question and finally your signature (with four tildes, i.e. ~~~~). Using the {{Q}} will automatically put the page in the category of pages with questions - other editors hoping to help out can then go to that category page to see where the questions are. See the page for Template:Q for details and examples.
External Links
References
- ↑ Introduction to Matlab 7 for Engineers, 2/e, William Palm III
- ↑ Applied Numerical Methods with MATLAB for Engineers and Scientists, 2/e, Steven C. Chapra