本文为《Linear algebra and its applications》的读书笔记
目录
As we know, not all matrices can be factored as A = P D P − 1 A =PDP^{-1} A=PDP−1 with D D D diagonal. However, a factorization A = Q D P − 1 A = QDP^{-1} A=QDP−1 is possible for any m × n m\times n m×n matrix A A A ! A special factorization of this type, called the singular value decomposition, is one of the most useful matrix factorizations in applied linear algebra.
The singular value decomposition is based on the following property of the ordinary diagonalization that can be imitated for rectangular matrices: The absolute values of the eigenvalues of a symmetric matrix A A A measure the amounts that A A A stretches or shrinks certain vectors (the eigenvectors). If A x = λ x A\boldsymbol x =\lambda \boldsymbol x Ax=λx and ∥ x ∥ = 1 \left\|\boldsymbol x\right\|= 1 ∥x∥=1, then
If λ 1 \lambda_1 λ1 is the eigenvalue with the greatest magnitude, then a corresponding unit eigenvector v 1 \boldsymbol v_1 v1 identifies a direction in which the stretching effect of A A A is greatest. This description of v 1 \boldsymbol v_1 v1 and ∥ λ 1 ∥ \left\|\lambda_1\right\| ∥λ1∥ has an analogue for rectangular matrices that will lead to the singular value decomposition.
EXAMPLE 1
If A = [ 4 11 14 8 7 − 2 ] A=\begin{bmatrix}4& 11& 14\\8& 7& -2\end{bmatrix} A=[4811714−2], then the linear transformation x ↦ A x \boldsymbol x \mapsto A\boldsymbol x x↦Ax maps the unit sphere(球) { x : ∥ x ∥ = 1 } \{\boldsymbol x:\left\|\boldsymbol x\right\|= 1\} {
x:∥x∥=1} in R 3 \R^3 R3 onto an ellipse in R 2 \R^2 R2, shown in Figure 1. Find a unit vector x \boldsymbol x x at which the length ∥ A x ∥ \left\|A\boldsymbol x\right\| ∥Ax∥ is maximized, and compute this maximum length.
SOLUTION
Observe that
Also, A T A A^TA ATA is a symmetric matrix. So the problem now is to maximize the quadratic form x T ( A T A ) x \boldsymbol x^T(A^TA)\boldsymbol x xT(ATA)x subject to the constraint ∥ x ∥ = 1 \left\|\boldsymbol x\right\|= 1 ∥x∥=1. By Theorem 6 in Section 7.3, the maximum value is the greatest eigenvalue λ 1 \lambda_1 λ1 of A T A A^TA ATA. Also, the maximum value is attained at a unit eigenvector of A T A A^TA ATA corresponding to λ 1 \lambda_1 λ1.
The eigenvalues of A T A A^TA ATA are λ 1 = 360 , λ 2 = 90 \lambda_1 = 360, \lambda_2 = 90 λ1=360,λ2=90, and λ 3 = 0 \lambda_3 = 0 λ3=0. Corresponding unit eigenvectors are, respectively,
A v 1 A\boldsymbol v_1 Av1 is a point on the ellipse in Figure 1 farthest from the origin.
The Singular Values of an m × n m\times n m×n Matrix
Let A A A be an m × n m \times n m×n matrix. Then A T A A^TA ATA is symmetric and can be orthogonally diagonalized. Let { v 1 , . . . , v n } \{\boldsymbol v_1,...,\boldsymbol v_n\} { v1,...,vn} be an orthonormal basis for R n \R^n Rn consisting of eigenvectors of A T A A^TA ATA, and let { λ 1 , . . . , λ n } \{\lambda_1,...,\lambda_n\} { λ1,...,λn} be the associated eigenvalues of A T A A^TA ATA. Then, for 1 ≤ i ≤ n 1\leq i\leq n 1≤i≤n,
So the eigenvalues of A T A A^TA ATA are all nonnegative. By renumbering, if necessary, we may assume that the eigenvalues are arranged so that
The singular values of A A A are the square roots of the eigenvalues of A T A A^TA ATA, denoted by σ 1 , . . . , σ n \sigma_1,...,\sigma_n σ1,...,σn, and they are arranged in decreasing order. By equation (2), the singular values of A A A are the lengths of the vectors A v 1 , . . . , A v n A\boldsymbol v_1,...,A\boldsymbol v_n Av1,...,Avn.
PROOF
For i ≠ j i\neq j i=j ,
Thus { A v 1 , . . . , A v n } \{A\boldsymbol v_1,...,A\boldsymbol v_n\} {
Av1,...,Avn} is an orthogonal set. Furthermore, since the lengths of the vectors A v 1 , . . . , A v n A\boldsymbol v_1,...,A\boldsymbol v_n Av1,...,Avn are the singular values of A A A, and since there are r r r nonzero singular values, A v i ≠ 0 A\boldsymbol v_i\neq\boldsymbol 0 Avi=0 if and only if 1 ≤ i ≤ r 1\leq i\leq r 1≤i≤r. So A v 1 , . . . , A v r A\boldsymbol v_1,...,A\boldsymbol v_r Av1,...,Avr are linearly independent vectors, and they are in C o l A ColA ColA. Finally, for any y = A x \boldsymbol y=A\boldsymbol x y=Ax in C o l A ColA ColA, we can write x = c 1 v 1 + . . . + c n v n \boldsymbol x = c_1\boldsymbol v_1+...+ c_n\boldsymbol v_n x=c1v1+...+cnvn, and
Thus y \boldsymbol y y is in S p a n { A v 1 , . . . , A v r } Span\{A\boldsymbol v_1,...,A\boldsymbol v_r\} Span{
Av1,...,Avr}, which shows that { A v 1 , . . . , A v r } \{A\boldsymbol v_1,...,A\boldsymbol v_r\} {
Av1,...,Avr} is an (orthogonal) basis for C o l A ColA ColA. Hence r a n k A = d i m C o l A = r rankA = dim ColA= r rankA=dimColA=r.
The first singular value σ 1 \sigma_1 σ1 of an m × n m \times n m×n matrix A A A is the maximum of ∥ A x ∥ \left\|Ax\right\| ∥Ax∥ over all unit vectors. This maximum value is attained at a unit eigenvector v 1 \boldsymbol v_1 v1 of A T A A^TA ATA corresponding to the greatest eigenvalue λ 1 \lambda_1 λ1 of A T A A^TA ATA. The second singular value is the maximum of ∥ A x ∥ \left\|Ax\right\| ∥Ax∥ over all unit vectors orthogonal to v 1 \boldsymbol v_1 v1.
The Singular Value Decomposition (SVD)
The decomposition of A A A involves an m × n m\times n m×n “diagonal” matrix ∑ \sum ∑ of the form
where D D D is an r × r r\times r r×r diagonal matrix for some r r r not exceeding the smaller of m m m and n n n. (If r r r equals m m m or n n n or both, some or all of the zero matrices do not appear.)
The matrices U U U and V V V are not uniquely determined by A A A. The columns of U U U in such a decomposition are called left singular vectors of A A A, and the columns of V V V are called right singular vectors of A A A.
PROOF
Let λ i \lambda_i λi and v i \boldsymbol v_i vi be as in Theorem 9, so that { A v 1 , . . . , A v r } \{A\boldsymbol v_1,...,A\boldsymbol v_r\} {
Av1,...,Avr} is an orthogonal basis for C o l A ColA ColA. Normalize each A v i A\boldsymbol v_i Avi to obtain an orthonormal basis { u 1 , . . . , u r } \{\boldsymbol u_1,...,\boldsymbol u_r\} {
u1,...,ur}, where
and
Now extend { u 1 , . . . , u r } \{\boldsymbol u_1,...,\boldsymbol u_r\} {
u1,...,ur} to an orthonormal basis { u 1 , . . . , u m } \{\boldsymbol u_1,...,\boldsymbol u_m\} {
u1,...,um} of R m \R^m Rm, and let
By construction, U U U and V V V are orthogonal matrices. Also, from (4),
Let D D D be the diagonal matrix with diagonal entries σ 1 , . . . , σ r \sigma_1,...,\sigma_r σ1,...,σr , and let ∑ \sum ∑ be as in (3) above. Then
Since V V V is an orthogonal matrix,
The next two examples focus attention on the internal structure of a singular value decomposition. An efficient and numerically stable algorithm for this decomposition would use a different approach. See the Numerical Note at the end of the section.
EXAMPLE 3
Use the results of Examples 1 to construct a singular value decomposition of A = [ 4 11 14 8 7 − 2 ] A=\begin{bmatrix}4& 11& 14\\8& 7& -2\end{bmatrix} A=[4811714−2]
SOLUTION
A construction can be divided into three steps.
Step 1. Find an orthogonal diagonalization of A T A A^TA ATA.
Step 2. Set up V V V and ∑ \sum ∑. Arrange the eigenvalues of A T A A^TA ATA in decreasing order. In Example 1, the eigenvalues are already listed in decreasing order: 360 , 90 360, 90 360,90, and 0 0 0. The corresponding unit eigenvectors, v 1 , v 2 \boldsymbol v_1, \boldsymbol v_2 v1,v2, and v 3 \boldsymbol v_3 v3, are the right singular vectors of A A A.
The square roots of the eigenvalues are the singular values:
Step 3. Construct U U U. When A A A has rank r r r, the first r r r columns of U U U are the normalized vectors obtained from A v 1 , . . . , A v r A\boldsymbol v_1,...,A\boldsymbol v_r Av1,...,Avr . In this example, A A A has two nonzero singular values, so r a n k A = 2 rankA = 2 rankA=2. Recall that ∥ A v 1 ∥ = σ 1 \left\|A\boldsymbol v_1\right\|=\sigma_1 ∥Av1∥=σ1 and ∥ A v 2 ∥ = σ 2 \left\|A\boldsymbol v_2\right\|=\sigma_2 ∥Av2∥=σ2. Thus
Note that { u 1 , u 2 } \{\boldsymbol u_1,\boldsymbol u_2\} {
u1,u2} is already a basis for R 2 \R^2 R2. Thus no additional vectors are needed for U U U. The singular value decomposition of A A A is
EXAMPLE 4
Find a singular value decomposition of
SOLUTION
The eigenvalues of A T A A^TA ATA are 18 and 0, with corresponding unit eigenvectors
To construct U U U, first construct A v 1 A\boldsymbol v_1 Av1 and A v 2 A\boldsymbol v_2 Av2:
The only column found for U U U so far is
The other columns of U U U are found by extending the set { u 1 } \{\boldsymbol u_1\} {
u1} to an orthonormal basis for R 3 \R^3 R3. In this case, we need two orthogonal unit vectors u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 that are orthogonal to u 1 \boldsymbol u_1 u1. Each vector must satisfy u 1 T x = 0 \boldsymbol u_1^T\boldsymbol x= 0 u1Tx=0, which is equivalent to the equation x 1 − 2 x 2 + 2 x 3 = 0 x_1-2x_2+ 2x_3= 0 x1−2x2+2x3=0. A basis for the solution set of this equation is
Apply the Gram–Schmidt process (with normalizations) to { w 1 , w 2 } \{\boldsymbol w_1,\boldsymbol w_2\} {
w1,w2}, and obtain
Another way to find u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 is to realize that u 1 \boldsymbol u_1 u1 form an orthonormal basis for C o l A Col A ColA. The remaining u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 must be a basis for ( C o l A ) ⊥ = N u l A T (Col A)^\perp = Nul A^T (ColA)⊥=NulAT.
The next few exercises show some interesting facts.
EXERCISE
How are the singular values of A A A and A T A^T AT related?
SOLUTION
A T = ( U ∑ V T ) T = V ∑ T U T A^T=(U\sum V^T)^T=V\sum^T U^T AT=(U∑VT)T=V∑TUT. This is an SVD of A T A^T AT because V V V and U U U are orthogonal matrices and ∑ T \sum^T ∑T is an n × m n\times m n×m “diagonal” matrix. Since ∑ \sum ∑ and ∑ T \sum^T ∑T have the same nonzero diagonal entries, A A A and A T A^T AT have the same nonzero singular values.
[Note: If A A A is 2 × n 2\times n 2×n, then A A T AA^T AAT is only 2 × 2 2\times 2 2×2 and its eigenvalues may be easier to compute (by hand) than the eigenvalues of A T A A^TA ATA.]
EXERCISE 17
Show that if A A A is square, then ∣ d e t A ∣ |detA| ∣detA∣ is the product of the singular values of A A A.
SOLUTION
∣ d e t A ∣ = ∣ d e t ( U ∑ V T ) ∣ = ∣ d e t U ⋅ d e t ∑ ⋅ d e t V T ∣ = ∣ ± 1 ⋅ d e t ∑ ⋅ ± 1 ∣ = d e t ∑ |detA|=|det(U\sum V^T)|=|detU\cdot det\sum\cdot detV^T|=|\pm1\cdot det\sum\cdot \pm 1|=det\sum ∣detA∣=∣det(U∑VT)∣=∣detU⋅det∑⋅detVT∣=∣±1⋅det∑⋅±1∣=det∑
EXERCISE 19
A A A is an m × n m\times n m×n matrix with a singular value decomposition A = U ∑ V T A=U\sum V^T A=U∑VT , where U U U is an m × m m\times m m×m orthogonal matrix, ∑ \sum ∑ is an m × n m\times n m×n “diagonal” matrix with r r r positive entries and no negative entries, and V V V is an n × n n\times n n×n orthogonal matrix. Show that the columns of V V V are eigenvectors of A T A A^TA ATA, the columns of U U U are eigenvectors of A A T AA^T AAT , and the diagonal entries of ∑ \sum ∑ are the singular values of A A A.
SOLUTION
[Hint: Use the SVD to compute A T A A^TA ATA and A A T AA^T AAT .]
EXERCISE 20
Show that if P P P is an orthogonal m × m m\times m m×m matrix, then P A PA PA has the same singular values as A A A.
EXERCISE 22
Show that if A A A is an n × n n\times n n×n positive definite matrix, then an orthogonal diagonalization A = P D P T A= PDP^T A=PDPT is a singular value decomposition of A A A.
EXERCISE 23
Let
, where the u i \boldsymbol u_i ui and v i \boldsymbol v_i vi are as in Theorem 10. Show that
SOLUTION
This expansion generalizes the spectral decomposition in Section 7.1.
EXERCISE 25
Let T : R n ↦ R m T: \R^n\mapsto \R^m T:Rn↦Rm be a linear transformation. Describe how to find a basis B \mathcal B B for R n \R^n Rn and a basis C \mathcal C C for R m \R^m Rm such that the matrix for T T T relative to B \mathcal B B and C \mathcal C C is an m × n m \times n m×n “diagonal” matrix.
SOLUTION
Consider the SVD for the standard matrix of T T T, say, A = U ∑ V T A = U\sum V^T A=U∑VT. Let B = { v 1 , … , v n } \mathcal B = \{\boldsymbol v_1, …, \boldsymbol v_n\} B={
v1,…,vn} and C = { u 1 , … , u m } C = \{\boldsymbol u_1, …, \boldsymbol u_m\} C={
u1,…,um} be bases constructed from the columns of V V V and U U U, respectively. Observe that, since the columns of V V V are orthonormal, V T v j = e j V^T\boldsymbol v_j = \boldsymbol e_j VTvj=ej, where e j \boldsymbol e_j ej is the j j jth column of the n × n n\times n n×n identity matrix. To find the matrix of T T T relative to B \mathcal B B and C \mathcal C C, compute
So [ T ( v j ) ] C = σ j e j [T(\boldsymbol v_j)]_{\mathcal C} = \sigma_j\boldsymbol e_j [T(vj)]C=σjej. The discussion at the beginning of Section 5.4 shows that the “diagonal” matrix ∑ \sum ∑ is the matrix of T T T relative to B \mathcal B B and C \mathcal C C.
EXERCISE
Prove that any n × n n\times n n×n matrix A A A admits a polar decomposition(极分解) of the form A = P Q A= PQ A=PQ, where P P P is an n × n n \times n n×n positive semidefinite matrix with the same rank as A A A and where Q Q Q is an n × n n\times n n×n orthogonal matrix.
SOLUTION
[Hint: Use a singular value decomposition, A = U ∑ V T A= U\sum V^T A=U∑VT , and observe that A = ( U ∑ U T ) ( U V T ) A=(U\sum U^T)(UV^T) A=(U∑UT)(UVT) and U ∑ U T U\sum U^T U∑UT is a symmetric matrix.]
Applications of the Singular Value Decomposition
The SVD is often used to estimate the rank of a matrix, as noted above. Several other numerical applications are described briefly below, and an application to image processing is presented in Section 7.5.
EXAMPLE 5 (The Condition Number (条件数))
Most numerical calculations involving an equation A x = b A\boldsymbol x =\boldsymbol b Ax=b are as reliable as possible when the SVD of A A A is used. The two orthogonal matrices U U U and V V V do not affect lengths of vectors or angles between vectors. Any possible instabilities in numerical calculations are identified in ∑ \sum ∑. If the singular values of A A A are extremely large or small, roundoff errors are almost inevitable, but an error analysis is aided by knowing the entries in ∑ \sum ∑ and V V V .
If A A A is an invertible n × n n\times n n×n matrix, then the ratio σ 1 = σ n \sigma_1=\sigma_n σ1=σn of the largest and smallest singular values gives the condition number of A A A. Exercises 41–43 in Section 2.3 showed how the condition number affects the sensitivity of a solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b to changes (or errors) in the entries of A. (Actually, a “condition number” of A A A can be computed in several ways, but the definition given here is widely used for studying A x = b A\boldsymbol x =\boldsymbol b Ax=b.)
EXAMPLE 6 (Bases for Fundamental Subspaces)
Given an SVD for an m × n m \times n m×n matrix A A A, let u 1 , . . . , u m \boldsymbol u_1,...,\boldsymbol u_m u1,...,um be the left singular vectors, v 1 , . . . , v n \boldsymbol v_1,...,\boldsymbol v_n v1,...,vn the right singular vectors, and σ 1 , . . . , σ n \sigma_1,...,\sigma_n σ1,...,σn the singular values, and let r r r be the rank of A A A. By Theorem 9,
{ u 1 , . . . , u r } ( 5 ) \{\boldsymbol u_1,...,\boldsymbol u_r\}\ \ \ \ (5) {
u1,...,ur} (5)
is an orthonormal basis for C o l A ColA ColA.
Recall that ( C o l A ) ⊥ = N u l A T (Col A)^{\perp}= NulA^T (ColA)⊥=NulAT . Hence
{ u r + 1 , . . . , u m } ( 6 ) \{\boldsymbol u_{r+1},...,\boldsymbol u_m\}\ \ \ \ (6) {
ur+1,...,um} (6)
is an orthonormal basis for N u l A T NulA^T NulAT .
Since ∥ A v i ∥ = σ i \left\|A\boldsymbol v_i\right\| =\sigma_i ∥Avi∥=σi for 1 ≤ i ≤ n 1\leq i\leq n 1≤i≤n, and σ i \sigma_i σi is 0 if and only if i > r i > r i>r, the vectors v r + 1 , . . . , v n \boldsymbol v_{r+1},...,\boldsymbol v_n vr+1,...,vn span a subspace of N u l A NulA NulA of dimension n − r n - r n−r. By the Rank Theorem, d i m N u l A = n − r a n k A = n − r dim NulA = n - rankA=n-r dimNulA=n−rankA=n−r. It follows that
{ v r + 1 , . . . , v n } ( 7 ) \{\boldsymbol v_{r+1},...,\boldsymbol v_n\}\ \ \ \ (7) { vr+1,...,vn} (7)
is an orthonormal basis for N u l A NulA NulA.
( N u l A ) ⊥ = C o l A T = R o w A (Nul A)^\perp= ColA^T = RowA (NulA)⊥=ColAT=RowA. Hence, from ( 7 ) (7) (7),
{ v 1 , . . . , v r } ( 8 ) \{\boldsymbol v_1,...,\boldsymbol v_r\}\ \ \ \ (8) { v1,...,vr} (8)
is an orthonormal basis for R o w A RowA RowA.
Figure 4 summarizes ( 5 ) – ( 8 ) (5)–(8) (5)–(8), but shows the orthogonal basis { σ 1 u 1 , . . . , σ r u r } \{\sigma_1\boldsymbol u_1,...,\sigma_r\boldsymbol u_r\} { σ1u1,...,σrur} for C o l A ColA ColA instead of the normalized basis, to remind you that A v i = σ i u i A\boldsymbol v_i= \sigma_i \boldsymbol u_i Avi=σiui for 1 ≤ i ≤ r 1\leq i \leq r 1≤i≤r.
The four fundamental subspaces and the concept of singular values provide the final statements of the Invertible Matrix Theorem.
EXAMPLE 7 (Reduced SVD and the Pseudoinverse of A A A (奇异值分解的简化和 A A A 的伪逆))
When ∑ \sum ∑ contains rows or columns of zeros, a more compact decomposition of A A A is possible. Using the notation established above, let r = r a n k A r= rankA r=rankA, and partition U U U and V V V into submatrices whose first blocks contain r r r columns:
Then U r U_r Ur is m × r m\times r m×r and V r V_r Vr is n × r n\times r n×r. (To simplify notation, we consider U m − r U_{m-r} Um−r or V n − r V_{n-r} Vn−r even though one of them may have no columns.) Then partitioned matrix multiplication shows that
This factorization of A A A is called a reduced singular value decomposition of A A A. Since the diagonal entries in D D D are nonzero, D D D is invertible. The following matrix is called the pseudoinverse (伪逆) (also, the Moore–Penrose inverse (穆尔-彭罗斯逆)) of A A A:
The next Supplementary exercises explore some of the properties of the reduced singular value decomposition and the pseudoinverse.
Supplementary EXERCISE 12
Verify the properties of A + A^+ A+:
a. For each y \boldsymbol y y in R m \R^m Rm, A A + y AA^+\boldsymbol y AA+y is the orthogonal projection of y \boldsymbol y y onto C o l A ColA ColA.
b. For each x \boldsymbol x x in R n \R^n Rn, A + A x A^+A\boldsymbol x A+Ax is the orthogonal projection of x \boldsymbol x x onto R o w A RowA RowA.
c. A A + A = A AA^+A = A AA+A=A and A + A A + = A + A^+AA^+ = A^+ A+AA+=A+.
Supplementary EXERCISE 13
Suppose the equation A x = b A\boldsymbol x =\boldsymbol b Ax=b is consistent, and let x + = A + b \boldsymbol x^+ = A^+\boldsymbol b x+=A+b. By Exercise 23 in Section 6.3, there is exactly one vector p \boldsymbol p p in R o w A RowA RowA such that A p = b A\boldsymbol p =\boldsymbol b Ap=b. The following steps prove that x + = p \boldsymbol x^+ =\boldsymbol p x+=p and x + \boldsymbol x^+ x+ is the minimum length solution of A x = b A\boldsymbol x=\boldsymbol b Ax=b.
a. Show that x + \boldsymbol x^+ x+ is in R o w A RowA RowA.
b. Show that x + \boldsymbol x^+ x+ is a solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b.
c. Show that if u \boldsymbol u u is any solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b, then ∥ x + ∥ ≤ ∥ u ∥ \left\|\boldsymbol x^+\right\|\leq\left\|\boldsymbol u\right\| ∥x+∥≤∥u∥, with equality only if u = x + \boldsymbol u = \boldsymbol x^+ u=x+.
SOLUTION
a. x + = V r D − 1 U r T \boldsymbol x^+=V_rD^{-1}U_r^T x+=VrD−1UrT. Since the columns of V r V_r Vr form an orthonormal basis for R o w A RowA RowA, x + \boldsymbol x^+ x+ is a linear combination of the R o w A RowA RowA's orthonormal basis. Thus x + \boldsymbol x^+ x+ is in R o w A RowA RowA.
b. A x + = A A + b = A A + A x = A x = b A\boldsymbol x^+=AA^+\boldsymbol b=AA^+A\boldsymbol x=A\boldsymbol x=\boldsymbol b Ax+=AA+b=AA+Ax=Ax=b
c. x + \boldsymbol x^+ x+ is the orthogonal projection of u \boldsymbol u u onto R o w A RowA RowA. …
Supplementary EXERCISE 14
Given any b \boldsymbol b b in R m \R^m Rm, adapt Exercise 13 to show that A + b A^+\boldsymbol b A+b is the least-squares solution of minimum length.
SOLUTION
[Hint: Consider the equation A x = b ^ A\boldsymbol x = \hat\boldsymbol b Ax=b^, where b ^ \hat\boldsymbol b b^ is the orthogonal projection of b \boldsymbol b b onto C o l A ColA ColA.]
EXAMPLE 8 (Least-Squares Solution)
Given the equation A x = b A\boldsymbol x =\boldsymbol b Ax=b, use the pseudoinverse of A A A to define
Then,
U r U r T b U_rU_r^T\boldsymbol b UrUrTb is the orthogonal projection b ^ \hat\boldsymbol b b^ of b \boldsymbol b b onto C o l A ColA ColA. Thus x ^ \hat\boldsymbol x x^ is a least-squares solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b. In fact, this x ^ \hat \boldsymbol x x^ has the smallest length among all least-squares solutions of A x = b A\boldsymbol x=\boldsymbol b Ax=b. See Supplementary Exercise 14.