{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"\n",
"*This notebook contains an excerpt from the [Python Programming and Numerical Methods - A Guide for Engineers and Scientists](https://www.elsevier.com/books/python-programming-and-numerical-methods/kong/978-0-12-819549-9), the content is also available at [Berkeley Python Numerical Methods](https://pythonnumericalmethods.berkeley.edu/notebooks/Index.html).*\n",
"\n",
"*The copyright of the book belongs to Elsevier. We also have this interactive book online for a better learning experience. The code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work on [Elsevier](https://www.elsevier.com/books/python-programming-and-numerical-methods/kong/978-0-12-819549-9) or [Amazon](https://www.amazon.com/Python-Programming-Numerical-Methods-Scientists/dp/0128195495/ref=sr_1_1?dchild=1&keywords=Python+Programming+and+Numerical+Methods+-+A+Guide+for+Engineers+and+Scientists&qid=1604761352&sr=8-1)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"< [16.2 Least Squares Regression Derivation (Linear Algebra)](chapter16.02-Least-Squares-Regression-Derivation-Linear-Algebra.ipynb) | [Contents](Index.ipynb) | [16.4 Least Squares Regression in Python](chapter16.04-Least-Squares-Regression-in-Python.ipynb) >"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Least Squares Regression Derivation (Multivariable Calculus)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Recall that the total error for $m$ data points and $n$ basis functions is:\n",
"\n",
"$$\n",
"E = \\sum_{i=1}^m e_i^2 = \\sum_{i=1}^m (\\hat{y}(x_i) - y_i)^2 = \\sum_{i=1}^m \\left(\\sum_{j=1}^n {\\alpha}_j f_j(x_i) - y_i\\right)^2.\n",
"$$\n",
"\n",
"which is an $n$-dimensional paraboloid in ${\\alpha}_k$. From calculus, we know that the minimum of a paraboloid is where all the partial derivatives equal zero. So taking partial derivative of $E$ with respect to the variable ${\\alpha}_k$ (remember that in this case the parameters are our variables), setting the system of equations equal to 0 and solving for the ${\\alpha}_k$'s should give the correct results.\n",
"\n",
"The partial derivative with respect to ${\\alpha}_k$ and setting equal to 0 yields:\n",
"$$\n",
"\\frac{\\partial E}{\\partial {\\alpha}_k} = \\sum_{i=1}^m 2\\left(\\sum_{j=1}^n {\\alpha}_j f_j(x_i) - y_i\\right)f_k(x_i) = 0.\n",
"$$\n",
"\n",
"With some rearrangement, the previous expression can be manipulated to the following:\n",
"$$\n",
"\\sum_{i=1}^m \\sum_{j=1}^n {\\alpha}_j f_j(x_i)f_k(x_i) - \\sum_{i=1}^m y_i f_k(x_i) = 0,\n",
"$$\n",
"\n",
"and further rearrangement taking advantage of the fact that addition commutes results in:\n",
"$$\n",
"\\sum_{j=1}^n {\\alpha}_j \\sum_{i=1}^m f_j(x_i)f_k(x_i) = \\sum_{i=1}^m y_i f_k(x_i).\n",
"$$\n",
"Now let $X$ be a column vector such that the $i$-th element of $X$ is $x_i$ and $Y$ similarly constructed, and let $F_j(X)$ be a column vector such that the $i$-th element of $F_j(X)$ is $f_j(x_i)$. Using this notation, the previous expression can be rewritten in vector notation as:\n",
"$$\n",
"\\left[F_k^T(X)F_1(X), F_k^T(X)F_2(X), \\ldots, F_k^T(X)F_j(X), \\ldots, F_k^T(X)F_n(X)\\right]\n",
"\\left[\\begin{array}{c} {\\alpha}_1 \\\\\n",
"{\\alpha}_2 \\\\\n",
"\\cdots \\\\\n",
"{\\alpha}_j \\\\\n",
"\\cdots \\\\\n",
"{\\alpha}_n\n",
"\\end{array}\\right] = F_k^T(X)Y.\n",
"$$\n",
"If we repeat this equation for every $k$, we get the following system of linear equations in matrix form:\n",
"\n",
"$$\n",
"\\left[\\begin{array}{cc}\n",
"F_1^T(X)F_1(X), F_1^T(X)F_2(X), \\ldots, F_1^T(X)F_j(X), \\ldots, F_1^T(X)F_n(X)&\\\\ \n",
"F_2^T(X)F_1(X), F_2^T(X)F_2(X), \\ldots, F_2^T(X)F_j(X), \\ldots, F_2^T(X)F_n(X)&\\\\\n",
"& \\cdots \\ \\cdots\\\\\n",
"F_n^T(X)F_1(X), F_n^T(X)F_2(X), \\ldots, F_n^T(X)F_j(X), \\ldots, F_n^T(X)F_n(X)\n",
"\\end{array}\\right]\n",
"\\left[\\begin{array}{c} {\\alpha}_1 \\\\\n",
"{\\alpha}_2 \\\\\n",
"\\cdots \\\\\n",
"{\\alpha}_j \\\\\n",
"\\cdots \\\\\n",
"{\\alpha}_n\n",
"\\end{array}\\right] =\n",
"\\left[\\begin{array}{c} F_1^T(X)Y \\\\\n",
"F_2^T(X)Y \\\\\n",
"\\cdots \\\\\n",
"F_n^T(X)Y\n",
"\\end{array}\\right].\n",
"$$\n",
"\n",
"If we let $A = [F_1(X), F_2(X), \\ldots, F_j(X), \\ldots, F_n(X)]$ and ${\\beta}$ be a column vector such that $j$-th element of ${\\beta}$ is ${\\alpha}_j$, then the previous system of equations becomes\n",
"$$\n",
"A^T A {\\beta} = A^T Y,\n",
"$$\n",
"and solving this matrix equation for ${\\beta}$ gives ${\\beta} = (A^T A)^{-1} A^T Y$, which is exactly the same formula as the previous derivation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"< [16.2 Least Squares Regression Derivation (Linear Algebra)](chapter16.02-Least-Squares-Regression-Derivation-Linear-Algebra.ipynb) | [Contents](Index.ipynb) | [16.4 Least Squares Regression in Python](chapter16.04-Least-Squares-Regression-in-Python.ipynb) >"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}