Given data \((1,2),(2,2),(3,4),(4,5)\text{,}\) compute the best-fit line and residual norm.
Section 5.4 Least Squares Regression
Many practical data problems lead to inconsistent linear systems \(A\mathbf{x}=\mathbf{b}\text{,}\) where \(\mathbf{b}\) is not in the column space of \(A\text{.}\) The least squares approach chooses an approximate solution that minimizes the squared Euclidean norm of the residual:
\begin{equation*}
\min_{\mathbf{x}\in\mathbb{R}^n} \|A\mathbf{x}-\mathbf{b}\|^2.
\end{equation*}
This is
equivalent to orthogonally projecting \(\mathbf{b}\) onto Col \(A\text{.}\)
Subsection 5.4.1 Example: Simple Linear Regression
Suppose we want to fit a line \(y=\beta_0+\beta_1 x\) to data points \((x_i,y_i)\text{,}\) \(1\leq i\leq m\text{.}\) Define
\begin{equation*}
A = \begin{bmatrix}1 & x_1\\ 1 & x_2\\ \vdots & \vdots\\ 1 & x_m\end{bmatrix}, \quad \mathbf{b}=\begin{bmatrix}y_1\\ y_2\\ \vdots\\ y_m\end{bmatrix}, \quad \mathbf{\beta}=\begin{bmatrix}\beta_0\\ \beta_1\end{bmatrix}.
\end{equation*}
The least squares problem is \(\min_{\beta_0,\beta_1}\|A\mathbf{\beta}-\mathbf{b}\|^2\text{,}\) leading to the normal equations
\begin{equation*}
(A^TA)\hat{\mathbf{\beta}}=A^T\mathbf{b}.
\end{equation*}
Concrete Example: Consider five data points that do not lie on a single line:
\begin{equation*}
(1, 2.1), \quad (2, 2.6), \quad (3, 4.2), \quad (4, 7.9), \quad (5, 8.9).
\end{equation*}
We construct the design matrix and right-hand side vector:
\begin{equation*}
A = \begin{bmatrix}1 & 1\\ 1 & 2\\ 1 & 3\\ 1 & 4\\ 1 & 5\end{bmatrix}, \quad \mathbf{b}=\begin{bmatrix}2.1\\ 2.6\\ 4.2\\ 7.9\\ 8.9\end{bmatrix}.
\end{equation*}
The system \(A\mathbf{\beta}=\mathbf{b}\) is inconsistent (5 equations, 2 unknowns), so we seek the least squares solution. The key geometric insight: \(\hat{\mathbf{\beta}}\) makes \(A\hat{\mathbf{\beta}}\) the orthogonal projection of \(\mathbf{b}\) onto the column space of \(A\text{.}\)
Subsection 5.4.2 Example: Polynomial Least Squares
Fit a quadratic polynomial \(p(x)=c_0+c_1 x + c_2 x^2\) to data. For points \((x_i,y_i)\text{,}\) construct the Vandermonde-like matrix
\begin{equation*}
A= \begin{bmatrix}1 & x_1 & x_1^2\\ 1 & x_2 & x_2^2\\ \vdots & \vdots & \vdots \\ 1 & x_m & x_m^2\end{bmatrix}, \quad \mathbf{c}=\begin{bmatrix}c_0\\ c_1\\ c_2\end{bmatrix}.
\end{equation*}
Then solve \((A^TA)\hat{\mathbf{c}}=A^T\mathbf{b}\text{.}\)
Concrete Example: Consider five data points that cannot be fitted exactly by a quadratic:
\begin{equation*}
(-2, 3), \quad (-1, 5), \quad (0, 1), \quad (1, 4), \quad (2, 10).
\end{equation*}
We construct the design matrix for quadratic fitting:
\begin{equation*}
A = \begin{bmatrix}1 & -2 & 4\\ 1 & -1 & 1\\ 1 & 0 & 0\\ 1 & 1 & 1\\ 1 & 2 & 4\end{bmatrix}, \quad \mathbf{b}=\begin{bmatrix}3\\ 5\\ 1\\ 4\\ 10\end{bmatrix}.
\end{equation*}
The system \(A\mathbf{c}=\mathbf{b}\) is overdetermined (5 equations, 3 unknowns). The least squares solution \(\hat{\mathbf{c}}\) makes \(A\hat{\mathbf{c}}\) the orthogonal projection of \(\mathbf{b}\) onto the 3-dimensional column space of \(A\text{.}\)
Subsection 5.4.3 Exercises
Checkpoint 5.4.1.
Subsection 5.4.4 Least Square Regression in Calculus
Many problems in the physical sciences and engineering involve an approximation of a function \(f\) by another function \(g\text{.}\) If \(f\) is in \(C[a, b]\) (the inner product space of all continuous functions on \([a, b]\)), then \(g\) is usually chosen from a subspace \(W\) of \(C[a, b]\text{.}\) For example, to approximate the function
\begin{equation*}
f(x)=e^x, \quad 0 \leq x \leq 1
\end{equation*}
you could choose one of the forms of \(g\) listed below.
-
\(g(x)=a_0+a_1 x, \quad 0 \leq x \leq 1\) (Linear)
-
\(g(x)=a_0+a_1 x+a_2 x^2, \quad 0 \leq x \leq 1\) (Quadratic)
-
\(g(x)=a_0+a_1 \cos x+a_2 \sin x, \quad 0 \leq x \leq 1\) (Trigonometric)
The goal is to find the coefficients \(a_0, a_1, a_2, \ldots\) that minimize the squared error between \(f\) and \(g\) over the interval \([a,b]\text{.}\) Using the inner product structure of \(C[a,b]\) with
\begin{equation*}
\langle f, g \rangle = \int_a^b f(x)g(x) \, dx
\end{equation*}
we can formulate this as an orthogonal projection problem in the continuous function space.
Example 5.4.2. Finding a Least Squares Approximation(Calculus).
Find the least squares approximation \(g(x)=a_0+a_1 x\) of
\begin{equation*}
f(x)=e^x, \quad 0 \leq x \leq 1
\end{equation*}
Solution.
For this approximation, you need to find the constants \(a_0\) and \(a_1\) that minimize the value of
\begin{align*}
I \amp= \int_0^1[f(x)-g(x)]^2 \, dx\\
\amp= \int_0^1\left(e^x-a_0-a_1 x\right)^2 \, dx
\end{align*}
Evaluating this integral, you have
\begin{align*}
I \amp= \int_0^1\left(e^x-a_0-a_1 x\right)^2 \, dx\\
\amp= \int_0^1\left(e^{2 x}-2 a_0 e^x-2 a_1 x e^x+a_0^2+2 a_0 a_1 x+a_1^2 x^2\right) \, dx\\
\amp= \left[\frac{1}{2} e^{2 x}-2 a_0 e^x-2 a_1 e^x(x-1)+a_0^2 x+a_0 a_1 x^2+a_1^2 \frac{x^3}{3}\right]_0^1\\
\amp= \frac{1}{2}\left(e^2-1\right)-2 a_0(e-1)-2 a_1+a_0^2+a_0 a_1+\frac{1}{3} a_1^2
\end{align*}
Now, considering \(I\) to be a function of the variables \(a_0\) and \(a_1\text{,}\) use calculus to determine the values of \(a_0\) and \(a_1\) that minimize \(I\text{.}\) Specifically, by setting the partial derivatives
\begin{align*}
\frac{\partial I}{\partial a_0} \amp= 2 a_0-2 e+2+a_1\\
\frac{\partial I}{\partial a_1} \amp= a_0+\frac{2}{3} a_1-2
\end{align*}
equal to zero, you obtain the two linear equations in \(a_0\) and \(a_1\) below.
\begin{align*}
2 a_0+a_1 \amp= 2(e-1)\\
3 a_0+2 a_1 \amp= 6
\end{align*}
The solution of this system is
\begin{equation*}
a_0=4 e-10 \approx 0.873 \text{ and } a_1=18-6 e \approx 1.690
\end{equation*}
So, the best linear approximation of \(f(x)=e^x\) on the interval \([0,1]\) is
\begin{equation*}
g(x)=4 e-10+(18-6 e) x \approx 0.873+1.690 x
\end{equation*}
Theorem 5.4.3. Least Squares Approximation.
Let \(f\) be continuous on \([a, b]\text{,}\) and let \(W\) be a finite-dimensional subspace of \(C[a, b]\text{.}\) The least squares approximating function of \(f\) with respect to \(W\) is
\begin{equation*}
g=\left\langle f, \mathbf{w}_1\right\rangle \mathbf{w}_1+\left\langle f, \mathbf{w}_2\right\rangle \mathbf{w}_2+\cdots+\left\langle f, \mathbf{w}_n\right\rangle \mathbf{w}_n
\end{equation*}
where \(B=\left\{\mathbf{w}_1, \mathbf{w}_2, \ldots, \mathbf{w}_n\right\}\) is an orthonormal basis for \(W\text{.}\)
Activity 5.4.1.
Use SageMath to explore least squares polynomial approximations of \(f(x) = e^x\) on \([0, 1]\) for polynomials \(g(x)=a_0+a_1 x\) to minimize the squared error.
This approach extends the discrete least squares method from data points to continuous functions, providing a bridge between linear algebra and calculus in numerical analysis.
