本文为《Linear algebra and its applications》的读书笔记

As we know, not all matrices can be factored as $A =PDP^{-1}$ with $D$ diagonal. However, a factorization $A = QDP^{-1}$ is possible for any $m\times n$ matrix $A$ ! A special factorization of this type, called the singular value decomposition, is one of the most useful matrix factorizations in applied linear algebra.

The singular value decomposition is based on the following property of the ordinary diagonalization that can be imitated for rectangular matrices: The absolute values of the eigenvalues of a symmetric matrix $A$ measure the amounts that $A$ stretches or shrinks certain vectors (the eigenvectors). If $A\boldsymbol x =\lambda \boldsymbol x$ and $\left\|\boldsymbol x\right\|= 1$ , then

在这里插入图片描述
If $\lambda_1$ is the eigenvalue with the greatest magnitude, then a corresponding unit eigenvector $\boldsymbol v_1$ identifies a direction in which the stretching effect of $A$ is greatest. This description of $\boldsymbol v_1$ and $\left\|\lambda_1\right\|$ has an analogue for rectangular matrices that will lead to the singular value decomposition.

EXAMPLE 1
If $A=\begin{bmatrix}4& 11& 14\\8& 7& -2\end{bmatrix}$ , then the linear transformation $\boldsymbol x \mapsto A\boldsymbol x$ maps the unit sphere(球) $\{\boldsymbol x:\left\|\boldsymbol x\right\|= 1\}$ in $R^3$ onto an ellipse in $R^2$ , shown in Figure 1. Find a unit vector $\boldsymbol x$ at which the length $\left\|A\boldsymbol x\right\|$ is maximized, and compute this maximum length.

在这里插入图片描述
SOLUTION
Observe that

在这里插入图片描述
Also, $A^TA$ is a symmetric matrix. So the problem now is to maximize the quadratic form $\boldsymbol x^T(A^TA)\boldsymbol x$ subject to the constraint $\left\|\boldsymbol x\right\|= 1$ . By Theorem 6 in Section 7.3, the maximum value is the greatest eigenvalue $\lambda_1$ of $A^TA$ . Also, the maximum value is attained at a unit eigenvector of $A^TA$ corresponding to $\lambda_1$ .

在这里插入图片描述
The eigenvalues of $A^TA$ are $\lambda_1 = 360, \lambda_2 = 90$ , and $\lambda_3 = 0$ . Corresponding unit eigenvectors are, respectively,

在这里插入图片描述
$A\boldsymbol v_1$ is a point on the ellipse in Figure 1 farthest from the origin.

The Singular Values of an $m\times n$ Matrix

Let $A$ be an $\times n$ matrix. Then $A^TA$ is symmetric and can be orthogonally diagonalized. Let $\{\boldsymbol v_1,...,\boldsymbol v_n\}$ be an orthonormal basis for $R^n$ consisting of eigenvectors of $A^TA$ , and let $\{\lambda_1,...,\lambda_n\}$ be the associated eigenvalues of $A^TA$ . Then, for $1\leq i\leq n$ ,

在这里插入图片描述
So the eigenvalues of $A^TA$ are all nonnegative. By renumbering, if necessary, we may assume that the eigenvalues are arranged so that

在这里插入图片描述
The singular values of $A$ are the square roots of the eigenvalues of $A^TA$ , denoted by $\sigma_1,...,\sigma_n$ , and they are arranged in decreasing order. By equation (2), the singular values of $A$ are the lengths of the vectors $A\boldsymbol v_1,...,A\boldsymbol v_n$ .

在这里插入图片描述
PROOF
For $i\neq j$ ,

在这里插入图片描述
Thus $\{A\boldsymbol v_1,...,A\boldsymbol v_n\}$ is an orthogonal set. Furthermore, since the lengths of the vectors $A\boldsymbol v_1,...,A\boldsymbol v_n$ are the singular values of $A$ , and since there are $r$ nonzero singular values, $A\boldsymbol v_i\neq\boldsymbol 0$ if and only if $1\leq i\leq r$ . So $A\boldsymbol v_1,...,A\boldsymbol v_r$ are linearly independent vectors, and they are in $C o l A$ . Finally, for any $\boldsymbol y=A\boldsymbol x$ in $C o l A$ , we can write $\boldsymbol x = c_1\boldsymbol v_1+...+ c_n\boldsymbol v_n$ , and

在这里插入图片描述
Thus $\boldsymbol y$ is in $Span\{A\boldsymbol v_1,...,A\boldsymbol v_r\}$ , which shows that $\{A\boldsymbol v_1,...,A\boldsymbol v_r\}$ is an (orthogonal) basis for $C o l A$ . Hence $r a n k A = d i m C o l A = r$ .

在这里插入图片描述
The first singular value $\sigma_1$ of an $\times n$ matrix $A$ is the maximum of $\left\|Ax\right\|$ over all unit vectors. This maximum value is attained at a unit eigenvector $\boldsymbol v_1$ of $A^TA$ corresponding to the greatest eigenvalue $\lambda_1$ of $A^TA$ . The second singular value is the maximum of $\left\|Ax\right\|$ over all unit vectors orthogonal to $\boldsymbol v_1$ .

The Singular Value Decomposition (SVD)

The decomposition of $A$ involves an $m\times n$ “diagonal” matrix $\sum$ of the form

在这里插入图片描述
where $D$ is an $r\times r$ diagonal matrix for some $r$ not exceeding the smaller of $m$ and $n$ . (If $r$ equals $m$ or $n$ or both, some or all of the zero matrices do not appear.)

在这里插入图片描述
The matrices $U$ and $V$ are not uniquely determined by $A$ . The columns of $U$ in such a decomposition are called left singular vectors of $A$ , and the columns of $V$ are called right singular vectors of $A$ .

PROOF
Let $\lambda_i$ and $\boldsymbol v_i$ be as in Theorem 9, so that $\{A\boldsymbol v_1,...,A\boldsymbol v_r\}$ is an orthogonal basis for $C o l A$ . Normalize each $A\boldsymbol v_i$ to obtain an orthonormal basis $\{\boldsymbol u_1,...,\boldsymbol u_r\}$ , where

在这里插入图片描述
and

在这里插入图片描述
Now extend $\{\boldsymbol u_1,...,\boldsymbol u_r\}$ to an orthonormal basis $\{\boldsymbol u_1,...,\boldsymbol u_m\}$ of $R^m$ , and let

在这里插入图片描述
By construction, $U$ and $V$ are orthogonal matrices. Also, from (4),

在这里插入图片描述
Let $D$ be the diagonal matrix with diagonal entries $\sigma_1,...,\sigma_r$ , and let $\sum$ be as in (3) above. Then

在这里插入图片描述
Since $V$ is an orthogonal matrix,

在这里插入图片描述

The next two examples focus attention on the internal structure of a singular value decomposition. An efficient and numerically stable algorithm for this decomposition would use a different approach. See the Numerical Note at the end of the section.

EXAMPLE 3
Use the results of Examples 1 to construct a singular value decomposition of $A=\begin{bmatrix}4& 11& 14\\8& 7& -2\end{bmatrix}$

SOLUTION
A construction can be divided into three steps.

Step 1. Find an orthogonal diagonalization of $A^TA$ .
Step 2. Set up $V$ and $\sum$ . Arrange the eigenvalues of $A^TA$ in decreasing order. In Example 1, the eigenvalues are already listed in decreasing order: $360, 90$ , and $0$ . The corresponding unit eigenvectors, $\boldsymbol v_1, \boldsymbol v_2$ , and $\boldsymbol v_3$ , are the right singular vectors of $A$ .

在这里插入图片描述
The square roots of the eigenvalues are the singular values:

在这里插入图片描述

Step 3. Construct $U$ . When $A$ has rank $r$ , the first $r$ columns of $U$ are the normalized vectors obtained from $A\boldsymbol v_1,...,A\boldsymbol v_r$ . In this example, $A$ has two nonzero singular values, so $r a n k A = 2$ . Recall that $\left\|A\boldsymbol v_1\right\|=\sigma_1$ and $\left\|A\boldsymbol v_2\right\|=\sigma_2$ . Thus

在这里插入图片描述
Note that $\{\boldsymbol u_1,\boldsymbol u_2\}$ is already a basis for $R^2$ . Thus no additional vectors are needed for $U$ . The singular value decomposition of $A$ is

在这里插入图片描述
EXAMPLE 4
Find a singular value decomposition of

在这里插入图片描述
SOLUTION
The eigenvalues of $A^TA$ are 18 and 0, with corresponding unit eigenvectors

在这里插入图片描述

To construct $U$ , first construct $A\boldsymbol v_1$ and $A\boldsymbol v_2$ :

在这里插入图片描述
The only column found for $U$ so far is

在这里插入图片描述
The other columns of $U$ are found by extending the set $\{\boldsymbol u_1\}$ to an orthonormal basis for $R^3$ . In this case, we need two orthogonal unit vectors $\boldsymbol u_2$ and $\boldsymbol u_3$ that are orthogonal to $\boldsymbol u_1$ . Each vector must satisfy $\boldsymbol u_1^T\boldsymbol x= 0$ , which is equivalent to the equation $x_1-2x_2+ 2x_3= 0$ . A basis for the solution set of this equation is

在这里插入图片描述
Apply the Gram–Schmidt process (with normalizations) to $\{\boldsymbol w_1,\boldsymbol w_2\}$ , and obtain

在这里插入图片描述

Another way to find $\boldsymbol u_2$ and $\boldsymbol u_3$ is to realize that $\boldsymbol u_1$ form an orthonormal basis for $C o l A$ . The remaining $\boldsymbol u_2$ and $\boldsymbol u_3$ must be a basis for $A)^\perp = Nul A^T$ .

The next few exercises show some interesting facts.

EXERCISE
How are the singular values of $A$ and $A^T$ related?
SOLUTION
$A^T=(U\sum V^T)^T=V\sum^T U^T$ . This is an SVD of $A^T$ because $V$ and $U$ are orthogonal matrices and $\sum^T$ is an $n\times m$ “diagonal” matrix. Since $\sum$ and $\sum^T$ have the same nonzero diagonal entries, $A$ and $A^T$ have the same nonzero singular values.
[Note: If $A$ is $2\times n$ , then $AA^T$ is only $2\times 2$ and its eigenvalues may be easier to compute (by hand) than the eigenvalues of $A^TA$ .]

EXERCISE 19
$A$ is an $m\times n$ matrix with a singular value decomposition $A=U\sum V^T$ , where $U$ is an $m\times m$ orthogonal matrix, $\sum$ is an $m\times n$ “diagonal” matrix with $r$ positive entries and no negative entries, and $V$ is an $n\times n$ orthogonal matrix. Show that the columns of $V$ are eigenvectors of $A^TA$ , the columns of $U$ are eigenvectors of $AA^T$ , and the diagonal entries of $\sum$ are the singular values of $A$ .
SOLUTION
[Hint: Use the SVD to compute $A^TA$ and $AA^T$ .]
在这里插入图片描述

EXERCISE 20
Show that if $P$ is an orthogonal $m\times m$ matrix, then $P A$ has the same singular values as $A$ .

EXERCISE 22
Show that if $A$ is an $n\times n$ positive definite matrix, then an orthogonal diagonalization $A= PDP^T$ is a singular value decomposition of $A$ .

EXERCISE 23
Let
在这里插入图片描述

, where the $\boldsymbol u_i$ and $\boldsymbol v_i$ are as in Theorem 10. Show that

在这里插入图片描述
SOLUTION

This expansion generalizes the spectral decomposition in Section 7.1.

EXERCISE 25
Let $\R^n\mapsto \R^m$ be a linear transformation. Describe how to find a basis $\mathcal B$ for $R^n$ and a basis $\mathcal C$ for $R^m$ such that the matrix for $T$ relative to $\mathcal B$ and $\mathcal C$ is an $\times n$ “diagonal” matrix.
SOLUTION
Consider the SVD for the standard matrix of $T$ , say, $U\sum V^T$ . Let $\mathcal B = \{\boldsymbol v_1, …, \boldsymbol v_n\}$ and $\{\boldsymbol u_1, …, \boldsymbol u_m\}$ be bases constructed from the columns of $V$ and $U$ , respectively. Observe that, since the columns of $V$ are orthonormal, $V^T\boldsymbol v_j = \boldsymbol e_j$ , where $\boldsymbol e_j$ is the $j$ th column of the $n\times n$ identity matrix. To find the matrix of $T$ relative to $\mathcal B$ and $\mathcal C$ , compute

在这里插入图片描述

So $[T(\boldsymbol v_j)]_{\mathcal C} = \sigma_j\boldsymbol e_j$ . The discussion at the beginning of Section 5.4 shows that the “diagonal” matrix $\sum$ is the matrix of $T$ relative to $\mathcal B$ and $\mathcal C$ .

EXERCISE
Prove that any $n\times n$ matrix $A$ admits a polar decomposition(极分解) of the form $A = P Q$ , where $P$ is an $\times n$ positive semidefinite matrix with the same rank as $A$ and where $Q$ is an $n\times n$ orthogonal matrix.
SOLUTION
[Hint: Use a singular value decomposition, $U\sum V^T$ , and observe that $A=(U\sum U^T)(UV^T)$ and $U\sum U^T$ is a symmetric matrix.]

Applications of the Singular Value Decomposition

The SVD is often used to estimate the rank of a matrix, as noted above. Several other numerical applications are described briefly below, and an application to image processing is presented in Section 7.5.

EXAMPLE 5 (The Condition Number (条件数))
Most numerical calculations involving an equation $A\boldsymbol x =\boldsymbol b$ are as reliable as possible when the SVD of $A$ is used. The two orthogonal matrices $U$ and $V$ do not affect lengths of vectors or angles between vectors. Any possible instabilities in numerical calculations are identified in $\sum$ . If the singular values of $A$ are extremely large or small, roundoff errors are almost inevitable, but an error analysis is aided by knowing the entries in $\sum$ and $V$ .
If $A$ is an invertible $n\times n$ matrix, then the ratio $\sigma_1=\sigma_n$ of the largest and smallest singular values gives the condition number of $A$ . Exercises 41–43 in Section 2.3 showed how the condition number affects the sensitivity of a solution of $A\boldsymbol x =\boldsymbol b$ to changes (or errors) in the entries of A. (Actually, a “condition number” of $A$ can be computed in several ways, but the definition given here is widely used for studying $A\boldsymbol x =\boldsymbol b$ .)

EXAMPLE 6 (Bases for Fundamental Subspaces)
Given an SVD for an $\times n$ matrix $A$ , let $\boldsymbol u_1,...,\boldsymbol u_m$ be the left singular vectors, $\boldsymbol v_1,...,\boldsymbol v_n$ the right singular vectors, and $\sigma_1,...,\sigma_n$ the singular values, and let $r$ be the rank of $A$ . By Theorem 9,
$\{\boldsymbol u_1,...,\boldsymbol u_r\}\ \ \ \ (5)$

is an orthonormal basis for $C o l A$ .

Recall that $A)^{\perp}= NulA^T$ . Hence
$\{\boldsymbol u_{r+1},...,\boldsymbol u_m\}\ \ \ \ (6)$

is an orthonormal basis for $NulA^T$ .

Since $\left\|A\boldsymbol v_i\right\| =\sigma_i$ for $1\leq i\leq n$ , and $\sigma_i$ is 0 if and only if $i > r$ , the vectors $\boldsymbol v_{r+1},...,\boldsymbol v_n$ span a subspace of $N u l A$ of dimension $n - r$ . By the Rank Theorem, $d i m N u l A = n - r a n k A = n - r$ . It follows that

$\{\boldsymbol v_{r+1},...,\boldsymbol v_n\}\ \ \ \ (7)$

is an orthonormal basis for $N u l A$ .

$A)^\perp= ColA^T = RowA$ . Hence, from $(7)$ ,

$\{\boldsymbol v_1,...,\boldsymbol v_r\}\ \ \ \ (8)$

is an orthonormal basis for $R o w A$ .

Figure 4 summarizes $(5) - (8)$ , but shows the orthogonal basis $\{\sigma_1\boldsymbol u_1,...,\sigma_r\boldsymbol u_r\}$ for $C o l A$ instead of the normalized basis, to remind you that $A\boldsymbol v_i= \sigma_i \boldsymbol u_i$ for $1\leq i \leq r$ .

在这里插入图片描述

The four fundamental subspaces and the concept of singular values provide the final statements of the Invertible Matrix Theorem.

在这里插入图片描述

EXAMPLE 7 (Reduced SVD and the Pseudoinverse of $A$ (奇异值分解的简化和 $A$ 的伪逆))
When $\sum$ contains rows or columns of zeros, a more compact decomposition of $A$ is possible. Using the notation established above, let $r = r a n k A$ , and partition $U$ and $V$ into submatrices whose first blocks contain $r$ columns:

在这里插入图片描述

Then $U_r$ is $m\times r$ and $V_r$ is $n\times r$ . (To simplify notation, we consider $U_{m-r}$ or $V_{n-r}$ even though one of them may have no columns.) Then partitioned matrix multiplication shows that

在这里插入图片描述
This factorization of $A$ is called a reduced singular value decomposition of $A$ . Since the diagonal entries in $D$ are nonzero, $D$ is invertible. The following matrix is called the pseudoinverse (伪逆) (also, the Moore–Penrose inverse (穆尔-彭罗斯逆)) of $A$ :

在这里插入图片描述

The next Supplementary exercises explore some of the properties of the reduced singular value decomposition and the pseudoinverse.

Supplementary EXERCISE 12
Verify the properties of $A^+$ :
a. For each $\boldsymbol y$ in $R^m$ , $AA^+\boldsymbol y$ is the orthogonal projection of $\boldsymbol y$ onto $C o l A$ .
b. For each $\boldsymbol x$ in $R^n$ , $A^+A\boldsymbol x$ is the orthogonal projection of $\boldsymbol x$ onto $R o w A$ .
c. $AA^+A = A$ and $A^+AA^+ = A^+$ .

Supplementary EXERCISE 13
Suppose the equation $A\boldsymbol x =\boldsymbol b$ is consistent, and let $\boldsymbol x^+ = A^+\boldsymbol b$ . By Exercise 23 in Section 6.3, there is exactly one vector $\boldsymbol p$ in $R o w A$ such that $A\boldsymbol p =\boldsymbol b$ . The following steps prove that $\boldsymbol x^+ =\boldsymbol p$ and $\boldsymbol x^+$ is the minimum length solution of $A\boldsymbol x=\boldsymbol b$ .
a. Show that $\boldsymbol x^+$ is in $R o w A$ .
b. Show that $\boldsymbol x^+$ is a solution of $A\boldsymbol x =\boldsymbol b$ .
c. Show that if $\boldsymbol u$ is any solution of $A\boldsymbol x =\boldsymbol b$ , then $\left\|\boldsymbol x^+\right\|\leq\left\|\boldsymbol u\right\|$ , with equality only if $\boldsymbol u = \boldsymbol x^+$ .
SOLUTION
a. $\boldsymbol x^+=V_rD^{-1}U_r^T$ . Since the columns of $V_r$ form an orthonormal basis for $R o w A$ , $\boldsymbol x^+$ is a linear combination of the $R o w A$ 's orthonormal basis. Thus $\boldsymbol x^+$ is in $R o w A$ .
b. $A\boldsymbol x^+=AA^+\boldsymbol b=AA^+A\boldsymbol x=A\boldsymbol x=\boldsymbol b$
c. $\boldsymbol x^+$ is the orthogonal projection of $\boldsymbol u$ onto $R o w A$ . …

Supplementary EXERCISE 14
Given any $\boldsymbol b$ in $R^m$ , adapt Exercise 13 to show that $A^+\boldsymbol b$ is the least-squares solution of minimum length.
SOLUTION
[Hint: Consider the equation $A\boldsymbol x = \hat\boldsymbol b$ , where $\hat\boldsymbol b$ is the orthogonal projection of $\boldsymbol b$ onto $C o l A$ .]

EXAMPLE 8 (Least-Squares Solution)
Given the equation $A\boldsymbol x =\boldsymbol b$ , use the pseudoinverse of $A$ to define

在这里插入图片描述
Then,

在这里插入图片描述
$U_rU_r^T\boldsymbol b$ is the orthogonal projection $\hat\boldsymbol b$ of $\boldsymbol b$ onto $C o l A$ . Thus $\hat\boldsymbol x$ is a least-squares solution of $A\boldsymbol x =\boldsymbol b$ . In fact, this $\hat \boldsymbol x$ has the smallest length among all least-squares solutions of $A\boldsymbol x=\boldsymbol b$ . See Supplementary Exercise 14.