Least Squares Regression Derivation (Multivariable Calculus)¶

Recall that the total error for $m$ data points and $n$ basis functions is:

\[ E = \sum_{i=1}^m e_i^2 = \sum_{i=1}^m (\hat{y}(x_i) - y_i)^2 = \sum_{i=1}^m \left(\sum_{j=1}^n {\alpha}_j f_j(x_i) - y_i\right)^2. \]

which is an $n$-dimensional paraboloid in ${\alpha}_k$. From calculus, we know that the minimum of a paraboloid is where all the partial derivatives equal zero. So taking partial derivative of $E$ with respect to the variable ${\alpha}_k$ (remember that in this case the parameters are our variables), setting the system of equations equal to 0 and solving for the ${\alpha}_k$’s should give the correct results.

The partial derivative with respect to ${\alpha}_k$ and setting equal to 0 yields: $$ \frac{\partial E}{\partial {\alpha}_k} = \sum_{i=1}^m 2\left(\sum_{j=1}^n {\alpha}_j f_j(x_i) - y_i\right)f_k(x_i) = 0. $$

With some rearrangement, the previous expression can be manipulated to the following: $$ \sum_{i=1}^m \sum_{j=1}^n {\alpha}_j f_j(x_i)f_k(x_i) - \sum_{i=1}^m y_i f_k(x_i) = 0, $$

and further rearrangement taking advantage of the fact that addition commutes results in: $$ \sum_{j=1}^n {\alpha}_j \sum_{i=1}^m f_j(x_i)f_k(x_i) = \sum_{i=1}^m y_i f_k(x_i). $$ Now let $X$ be a column vector such that the $i$-th element of $X$ is $x_i$ and $Y$ similarly constructed, and let $F_j(X)$ be a column vector such that the $i$-th element of $F_j(X)$ is $f_j(x_i)$. Using this notation, the previous expression can be rewritten in vector notation as: $$ \left[F_k^T(X)F_1(X), F_k^T(X)F_2(X), \ldots, F_k^T(X)F_j(X), \ldots, F_k^T(X)F_n(X)\right] \left[\begin{array}{c} {\alpha}_1 \\ {\alpha}_2 \\ \cdots \\ {\alpha}_j \\ \cdots \\ {\alpha}_n \end{array}\right] = F_k^T(X)Y. $$ If we repeat this equation for every $k$, we get the following system of linear equations in matrix form:

\[\begin{split} \left[\begin{array}{cc} F_1^T(X)F_1(X), F_1^T(X)F_2(X), \ldots, F_1^T(X)F_j(X), \ldots, F_1^T(X)F_n(X)&\\ F_2^T(X)F_1(X), F_2^T(X)F_2(X), \ldots, F_2^T(X)F_j(X), \ldots, F_2^T(X)F_n(X)&\\ & \cdots \ \cdots\\ F_n^T(X)F_1(X), F_n^T(X)F_2(X), \ldots, F_n^T(X)F_j(X), \ldots, F_n^T(X)F_n(X) \end{array}\right] \left[\begin{array}{c} {\alpha}_1 \\ {\alpha}_2 \\ \cdots \\ {\alpha}_j \\ \cdots \\ {\alpha}_n \end{array}\right] = \left[\begin{array}{c} F_1^T(X)Y \\ F_2^T(X)Y \\ \cdots \\ F_n^T(X)Y \end{array}\right]. \end{split}\]

If we let $A = [F_1(X), F_2(X), \ldots, F_j(X), \ldots, F_n(X)]$ and ${\beta}$ be a column vector such that $j$-th element of ${\beta}$ is ${\alpha}_j$, then the previous system of equations becomes $$ A^T A {\beta} = A^T Y, $$ and solving this matrix equation for ${\beta}$ gives ${\beta} = (A^T A)^{-1} A^T Y$, which is exactly the same formula as the previous derivation.

< 16.2 Least Squares Regression Derivation (Linear Algebra) | Contents | 16.4 Least Squares Regression in Python >

Python Numerical Methods

Least Squares Regression Derivation (Multivariable Calculus)¶