Least Squares Regression

Section 5.4 Least Squares Regression

Many practical data problems lead to inconsistent linear systems \(A\mathbf{x}=\mathbf{b}\text{,}\) where \(\mathbf{b}\) is not in the column space of \(A\text{.}\) The least squares approach chooses an approximate solution that minimizes the squared Euclidean norm of the residual:

\begin{equation*} \min_{\mathbf{x}\in\mathbb{R}^n} \|A\mathbf{x}-\mathbf{b}\|^2. \end{equation*}

This is equivalent to orthogonally projecting \(\mathbf{b}\) onto Col \(A\text{.}\)

🔗

Subsection 5.4.1 Example: Simple Linear Regression

Suppose we want to fit a line \(y=\beta_0+\beta_1 x\) to data points \((x_i,y_i)\text{,}\) \(1\leq i\leq m\text{.}\) Define

\begin{equation*} A = \begin{bmatrix}1 & x_1\\ 1 & x_2\\ \vdots & \vdots\\ 1 & x_m\end{bmatrix}, \quad \mathbf{b}=\begin{bmatrix}y_1\\ y_2\\ \vdots\\ y_m\end{bmatrix}, \quad \mathbf{\beta}=\begin{bmatrix}\beta_0\\ \beta_1\end{bmatrix}. \end{equation*}

The least squares problem is \(\min_{\beta_0,\beta_1}\|A\mathbf{\beta}-\mathbf{b}\|^2\text{,}\) leading to the normal equations

\begin{equation*} (A^TA)\hat{\mathbf{\beta}}=A^T\mathbf{b}. \end{equation*}

🔗

Concrete Example: Consider five data points that do not lie on a single line:

\begin{equation*} (1, 2.1), \quad (2, 2.6), \quad (3, 4.2), \quad (4, 7.9), \quad (5, 8.9). \end{equation*}

We construct the design matrix and right-hand side vector:

\begin{equation*} A = \begin{bmatrix}1 & 1\\ 1 & 2\\ 1 & 3\\ 1 & 4\\ 1 & 5\end{bmatrix}, \quad \mathbf{b}=\begin{bmatrix}2.1\\ 2.6\\ 4.2\\ 7.9\\ 8.9\end{bmatrix}. \end{equation*}

🔗

The system \(A\mathbf{\beta}=\mathbf{b}\) is inconsistent (5 equations, 2 unknowns), so we seek the least squares solution. The key geometric insight: \(\hat{\mathbf{\beta}}\) makes \(A\hat{\mathbf{\beta}}\) the orthogonal projection of \(\mathbf{b}\) onto the column space of \(A\text{.}\)

🔗

Subsection 5.4.2 Example: Polynomial Least Squares

Fit a quadratic polynomial \(p(x)=c_0+c_1 x + c_2 x^2\) to data. For points \((x_i,y_i)\text{,}\) construct the Vandermonde-like matrix

\begin{equation*} A= \begin{bmatrix}1 & x_1 & x_1^2\\ 1 & x_2 & x_2^2\\ \vdots & \vdots & \vdots \\ 1 & x_m & x_m^2\end{bmatrix}, \quad \mathbf{c}=\begin{bmatrix}c_0\\ c_1\\ c_2\end{bmatrix}. \end{equation*}

Then solve \((A^TA)\hat{\mathbf{c}}=A^T\mathbf{b}\text{.}\)

🔗

Concrete Example: Consider five data points that cannot be fitted exactly by a quadratic:

\begin{equation*} (-2, 3), \quad (-1, 5), \quad (0, 1), \quad (1, 4), \quad (2, 10). \end{equation*}

We construct the design matrix for quadratic fitting:

\begin{equation*} A = \begin{bmatrix}1 & -2 & 4\\ 1 & -1 & 1\\ 1 & 0 & 0\\ 1 & 1 & 1\\ 1 & 2 & 4\end{bmatrix}, \quad \mathbf{b}=\begin{bmatrix}3\\ 5\\ 1\\ 4\\ 10\end{bmatrix}. \end{equation*}

🔗

The system \(A\mathbf{c}=\mathbf{b}\) is overdetermined (5 equations, 3 unknowns). The least squares solution \(\hat{\mathbf{c}}\) makes \(A\hat{\mathbf{c}}\) the orthogonal projection of \(\mathbf{b}\) onto the 3-dimensional column space of \(A\text{.}\)

🔗

Subsection 5.4.3 Exercises

Checkpoint 5.4.1.

Given data \((1,2),(2,2),(3,4),(4,5)\text{,}\) compute the best-fit line and residual norm.

🔗

Subsection 5.4.4 Least Square Regression in Calculus

Many problems in the physical sciences and engineering involve an approximation of a function \(f\) by another function \(g\text{.}\) If \(f\) is in \(C[a, b]\) (the inner product space of all continuous functions on \([a, b]\)), then \(g\) is usually chosen from a subspace \(W\) of \(C[a, b]\text{.}\) For example, to approximate the function

\begin{equation*} f(x)=e^x, \quad 0 \leq x \leq 1 \end{equation*}

you could choose one of the forms of \(g\) listed below.

🔗

\(g(x)=a_0+a_1 x, \quad 0 \leq x \leq 1\) (Linear)
🔗

🔗
\(g(x)=a_0+a_1 x+a_2 x^2, \quad 0 \leq x \leq 1\) (Quadratic)
🔗

🔗
\(g(x)=a_0+a_1 \cos x+a_2 \sin x, \quad 0 \leq x \leq 1\) (Trigonometric)
🔗

🔗

🔗

The goal is to find the coefficients \(a_0, a_1, a_2, \ldots\) that minimize the squared error between \(f\) and \(g\) over the interval \([a,b]\text{.}\) Using the inner product structure of \(C[a,b]\) with

\begin{equation*} \langle f, g \rangle = \int_a^b f(x)g(x) \, dx \end{equation*}

we can formulate this as an orthogonal projection problem in the continuous function space.

🔗

Example 5.4.2. Finding a Least Squares Approximation(Calculus).

Find the least squares approximation \(g(x)=a_0+a_1 x\) of

\begin{equation*} f(x)=e^x, \quad 0 \leq x \leq 1 \end{equation*}

🔗

Solution.

For this approximation, you need to find the constants \(a_0\) and \(a_1\) that minimize the value of

\begin{align*} I \amp= \int_0^1[f(x)-g(x)]^2 \, dx\\ \amp= \int_0^1\left(e^x-a_0-a_1 x\right)^2 \, dx \end{align*}

🔗

Evaluating this integral, you have

\begin{align*} I \amp= \int_0^1\left(e^x-a_0-a_1 x\right)^2 \, dx\\ \amp= \int_0^1\left(e^{2 x}-2 a_0 e^x-2 a_1 x e^x+a_0^2+2 a_0 a_1 x+a_1^2 x^2\right) \, dx\\ \amp= \left[\frac{1}{2} e^{2 x}-2 a_0 e^x-2 a_1 e^x(x-1)+a_0^2 x+a_0 a_1 x^2+a_1^2 \frac{x^3}{3}\right]_0^1\\ \amp= \frac{1}{2}\left(e^2-1\right)-2 a_0(e-1)-2 a_1+a_0^2+a_0 a_1+\frac{1}{3} a_1^2 \end{align*}

🔗

Now, considering \(I\) to be a function of the variables \(a_0\) and \(a_1\text{,}\) use calculus to determine the values of \(a_0\) and \(a_1\) that minimize \(I\text{.}\) Specifically, by setting the partial derivatives

\begin{align*} \frac{\partial I}{\partial a_0} \amp= 2 a_0-2 e+2+a_1\\ \frac{\partial I}{\partial a_1} \amp= a_0+\frac{2}{3} a_1-2 \end{align*}

equal to zero, you obtain the two linear equations in \(a_0\) and \(a_1\) below.

\begin{align*} 2 a_0+a_1 \amp= 2(e-1)\\ 3 a_0+2 a_1 \amp= 6 \end{align*}

🔗

The solution of this system is

\begin{equation*} a_0=4 e-10 \approx 0.873 \text{ and } a_1=18-6 e \approx 1.690 \end{equation*}

So, the best linear approximation of \(f(x)=e^x\) on the interval \([0,1]\) is

\begin{equation*} g(x)=4 e-10+(18-6 e) x \approx 0.873+1.690 x \end{equation*}

🔗

Theorem 5.4.3. Least Squares Approximation.

Let \(f\) be continuous on \([a, b]\text{,}\) and let \(W\) be a finite-dimensional subspace of \(C[a, b]\text{.}\) The least squares approximating function of \(f\) with respect to \(W\) is

\begin{equation*} g=\left\langle f, \mathbf{w}_1\right\rangle \mathbf{w}_1+\left\langle f, \mathbf{w}_2\right\rangle \mathbf{w}_2+\cdots+\left\langle f, \mathbf{w}_n\right\rangle \mathbf{w}_n \end{equation*}

where \(B=\left\{\mathbf{w}_1, \mathbf{w}_2, \ldots, \mathbf{w}_n\right\}\) is an orthonormal basis for \(W\text{.}\)

🔗

Activity 5.4.1.

Use SageMath to explore least squares polynomial approximations of \(f(x) = e^x\) on \([0, 1]\) for polynomials \(g(x)=a_0+a_1 x\) to minimize the squared error.

🔗

This approach extends the discrete least squares method from data points to continuous functions, providing a bridge between linear algebra and calculus in numerical analysis.

🔗

Prev Top Next