Linear Algebra

Chapter1: Introduction to vectors

Linear combinations

\[ c \boldsymbol{v}+d \boldsymbol{w}=c\left[\begin{array}{l} 1 \\ 1 \\ 1 \end{array}\right]+d\left[\begin{array}{l} 2 \\ 3 \\ 2 \end{array}\right]=\left[\begin{array}{l} c+2 d \\ c+3 d \\ c+2 d \end{array}\right] \]

Important concepts:

The combination \(c\boldsymbol{v}\) fill a line through \((0,0,0)\).
The combination \(c \boldsymbol{v}+d \boldsymbol{w}\) fill a plane through \((0,0,0)\).
The combination of \(c \boldsymbol{v}+d \boldsymbol{w} + e \boldsymbol{u}\) fill 3d space, where \(\boldsymbol{u}\) is a 3-d vector that is not in the plannne of \(\boldsymbol{u}\) and \(\boldsymbol{v}\).

Angles and dot products

The dot product reveals te exact angle \(\theta\) between unit vectors \(\boldsymbol{u}\) and \(\boldsymbol{U}\)

\[ \boldsymbol{u} \cdot \boldsymbol{U} = \cos\theta \]

Right angles: \(\boldsymbol{u} \cdot \boldsymbol{U} = 0\) when \(\boldsymbol{u}\) is perpendicular to \(\boldsymbol{U}\)

Cosine formula: \(\frac{\boldsymbol{v} \cdot \boldsymbol{w}}{\|\boldsymbol{v}\| \|\boldsymbol{w}\|} = \cos\theta\)

Schowarz inequality: \(\|\boldsymbol{v} \cdot \boldsymbol{w}\|\leq \|\boldsymbol{v}\| \|\boldsymbol{w}\|\)

Triangle inequality: \(\|\boldsymbol{v} + \boldsymbol{w}\|\leq \|\boldsymbol{v}\| + \|\boldsymbol{w}\|\)

Matrices & linear equations

Matrix times vector \(A\boldsymbol{x}=\) combination of the columns of \(A\)

Solution to \(A\boldsymbol{x}=b\) is \(\boldsymbol{x}=A^{-1}b\) when A is invertible

Chapter2: Solving linear equations

Vectors and linear equation

Two equations two unknowns

\[ \begin{aligned} x-2y &=1\\ 3x+2y &=11 \end{aligned} \]

Row pictures: Two lines meeting at a single point

Column pictures: Combine te column vectors to produce vector \(\boldsymbol{b}\)

\[ x\left[\begin{array}{l} 1 \\ 3 \end{array}\right]+ y\left[\begin{array}{r} -2 \\ 2 \end{array}\right] =\left[\begin{array}{c} 1 \\ 11 \end{array}\right] =\boldsymbol{b} \]

Matrix equation: \(A\boldsymbol{x}=\boldsymbol{b}\)

\[ \left[\begin{array}{rr} 1&-2 \\ 3&2 \end{array}\right] \left[\begin{array}{c} x \\ y \end{array}\right]= \left[\begin{array}{r} 1 \\ 11 \end{array}\right] \]

Elimination

Process of elimination:

2_2_elimination

Visualization of elimination:

2_3_elimination_pic

Pivots: First non-zero in the row that does the elimination

Non-singular matrix: Full set of pivots after elimination, one solution

Singular: ~~Full set of pivots after elimination~~, no solution or infinitely solutions (This depends on whether the equation contradicts to each other)

Elimination from \(A\) to upper triangle matrix \(U\)

Column 1: Use the first equation to create zeros below the first pivot

Column 2: Use the new equation 2 to create zeros below the second pivot

Column 3 to n: Keep going to find all n pivots and the upper triangular \(U\)

Augmented matrix

\[ \left[\begin{array}{cc} A & \mathbf{b} \end{array}\right] = \left[\begin{array}{rrrr} 2 & 4 & -2 & \mathbf{2} \\ 4 & 9 & -3 & \mathbf{8} \\ -2 & -3 & 7 & \mathbf{1 0} \end{array}\right] \]

\[ \left[\begin{array}{rrr} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1 \end{array}\right]\left[\begin{array}{rrrr} 2 & 4 & -2 & \mathbf{2} \\ 4 & 9 & -3 & \mathbf{8} \\ -2 & -3 & 7 & \mathbf{1 0} \end{array}\right]=\left[\begin{array}{rrrr} 2 & 4 & -2 & \mathbf{2} \\ 0 & 1 & 1 & \mathbf{4} \\ -2 & -3 & 7 & \mathbf{1 0} \end{array}\right] \]

Rules for matrix operations

The computational complexity of matrix multiplication is \(\mathcal{O}(n^{3})\)

A row times a column is inner product

A column times a row is outer product

Inverse matrices

Two-sided inverse:

\[ A^{-1}A = AA^{-1}=I \]

Notes:

Inverse exists if and only if elimination produces n pivots
If A is invertible, the only solution to \(A\boldsymbol{x}=\boldsymbol{b}\) is \(\boldsymbol{x}=A^{-1}\boldsymbol{b}\)
If there exist non-zero vector \(\boldsymbol{x}\) such that \(A\boldsymbol{x}=0\) , then \(A\) cannot have inverse. No matrix can bring \(\boldsymbol{0}\) back to \(\boldsymbol{x}\)
The matrix is invertible when the determinant does not equal to 0

Inverse of product \(AB\)

\[ (AB)^{-1}=B^{-1}A^{-1} \]

Gauss-Jordan elimination

Solve the inverse problem by augmented matrix

\[ \left[\begin{array}{cc} A & I \end{array}\right] \rightarrow \left[\begin{array}{cc} I & A^{-1} \end{array}\right] \]

Notes:

\(A\) is symmetric, \(A^{-1}\) will also be symmetric
The product of pivots of \(A^{-1}\) equals to the determinant of \(A\). Thus, a invertible matrix. cannot have zero deteriminant.

Recognize invertible matrix

Diagonally dominant matrices are invertible, i.e. the diagonal element is larger than the total sum of the rest elements in that row. (3>2)

\[ \begin{bmatrix} 3 & 1 & 1\\ 1 & 3 & 1\\ 1 & 1 & 3 \end{bmatrix} \]

Elimination=Factorization A=LU

The factors \(\boldsymbol{L}\) and \(\boldsymbol{U}\) are lower & upper trianglar matrix with pivots on its diagonal

2_5_LU

We aim to solve:

\[ \boldsymbol{L}\boldsymbol{U}\boldsymbol{x}=\boldsymbol{b} \]

And this can be decomposed to:

\[ \boldsymbol{L}\boldsymbol{c}=\boldsymbol{b} \]

and then

\[ \boldsymbol{U}\boldsymbol{x}=\boldsymbol{c} \]

Transpose and permutations

Tranpose of \(A+B\) is \(A^{T} + B^{T}\)
\((AB)^{T}=B^{T}A^{T}\)
\((A^{-1})^{T}=(A^{T})^{-1}\)
For permutation matrix \(P\), \(P^{-1}=P^{T}\)

Symmetric matrices

\[ S^{T}= S \]

Notes:

The inverse of symmetric is also symmetric
We can produce a symmetric by \(AA^{T}\) or \(A^{T}A\), A can be any matrix.

\(PA=LU\) Factorization

Use P to deal with row changes

Transpose of a Derivative

Let us say \(A=d/dt\), and define inner product as follows:

\[ \boldsymbol{x}^T\boldsymbol{y}=\int_{-\infty}^{\infty}\boldsymbol{x}(t)\boldsymbol{y}(t)dt \]

We know \((A\boldsymbol{x})^T\boldsymbol{y}=\boldsymbol{x}^T(A^T\boldsymbol{y})\)

\[ (A x, y)=\int_{-\infty}^{\infty} \frac{d x}{d t} y(t) d t=\int_{-\infty}^{\infty} x(t)\left(-\frac{d y}{d t}\right) d t=\left(x, A^{\mathrm{T}} y\right) \]

Thus, the transpose (adjoint) of derivative is minus the derivatives. (\(A^T=-A\) Antisymmetric)

Chapter3: Vector spaces and subspaces

Vector spaces

The space \(R^n\) consists of all column vectors \(v\) with \(n\) components

Subspaces

A subspace of a vector space is a set of vectors (including \(\mathbf{0}\)) that satisfies: if \(\boldsymbol{v}\) and \(\boldsymbol{w}\) are vectors in the subspace and \(c\) is any scalar, then:

\(\boldsymbol{v}+\boldsymbol{w}\) is in subspace
\(c\boldsymbol{v}\) is in the subspace

Notes:

Every subspace contains the zero vector (Otherwise we need extra dimension)
Lines through the origin are also subspaces
A subspace containing \(\boldsymbol{v}\) and \(\boldsymbol{w}\) must contain all linear combinations \(c\boldsymbol{v}+d\boldsymbol{w}\)

Column space

Column space consists of all linear combination of the columns

To solve \(A\boldsymbol{x}=\boldsymbol{b}\) is to express \(\boldsymbol{b}\) as a combination of the columns. That is to say \(A\boldsymbol{x}=\boldsymbol{b}\) is solvable if and only if \(\boldsymbol{b}\) is in the column space of \(A\)

Nullspace of A: Solving Ax=0

The nullspace \(N(A)\) consists of all solutions to \(A\boldsymbol{x}=0\).

Note:

A -> (m, n), \(\boldsymbol{x}\) -> (n, 1). These vectors \(\boldsymbol{x}\) of the solution would be in \(\mathbb{R}^n\).
All possible solution vectors \(\boldsymbol{x}\) span the subspace.

How to get the basis of Nullspace?

Reducing A to its row echelon form R
Finding special solutions to Ax=0

Here is an example

\[ C = \begin{bmatrix}1 & 2 & 2 & 4\\ 3 & 8 & 6 & 16\end{bmatrix} \quad \text{becomes} \quad U = \begin{bmatrix}1 & 2 & 2 & 4\\ 0 & 2 & 0 & 4\end{bmatrix} \]

Here we can find that here the column 3&4 do not have pivot, so they are free column. We can set the correponding \(x_3\) and \(x_4\) to be \(x_3=1, x_4=0\) and \(x_3=0, x_4=1\). Then, we get

\[ s_1 = \begin{bmatrix}-2 \\ 0 \\ 1 \\ 0 \end{bmatrix} \quad \text{and} s_2 = \begin{bmatrix}0 \\ -2 \\ 0 \\ 1 \end{bmatrix} \]

Alternatively, we can express C as its reduced row echelon form R (1. Produce zeros above pivots 2. Produce ones in the pivots)

\[ U = \begin{bmatrix}1 & 2 & 2 & 4\\ 0 & 2 & 0 & 4\end{bmatrix} \quad \text{becomes} \quad R = \begin{bmatrix}1 & 0 & 2 & 0\\ 0 & 1 & 0 & 2\end{bmatrix} \]

BTW, the only reason to make this is because it is much easier to find the solution. In fact, \(N(A) = N(U) = N(R)\).

One important concept here is that zero nullspace means the columns of \(A\) are independent!

Rank of a matrix

The rank of \(A\) is the number of pivots. This number is \(r\).

NOTE:

The rank r is the dimension of the column space & row space.
The greatest thing is that \(n-r\) is the dimension of the nullspace.

Elimination: The big picture

The column space of A - pivot columns of A as a basis
The row space of A - nonzero rows of R as a basis
The nullspace of A - special solutions to Rx = 0 (and Ax=0)

Complete solution to \(Ax=b\)

Use augmented matrix with elimination to solve it

\[ \begin{bmatrix} 1 & 3 & 0 & 2\\ 0 & 0 & 1 & 4\\ 1 & 3 & 1 & 6 \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ x_3\\ x_4 \end{bmatrix} = \begin{bmatrix} 1\\ 6\\ 7\\ \end{bmatrix} \]

One particular solution Ax=b

\[ \begin{bmatrix} A & b \end{bmatrix}= \begin{bmatrix} 1 & 3 & 0 & 2 & b_1\\ 0 & 0 & 1 & 4 & b_2\\ 1 & 3 & 1 & 6 & b_3 \end{bmatrix} \rightarrow \begin{bmatrix} 1 & 3 & 0 & 2 & b_1\\ 0 & 0 & 1 & 4 & b_2\\ 0 & 0 & 0 & 0 & b_3 - b_2 - b_1 \end{bmatrix}= \begin{bmatrix} R & d \end{bmatrix} \]

For a solution to exist, zero rows in \(R\) must also be zero in \(d\), we can get one particular solution \(x_p\):

\[ Rx_p = \begin{bmatrix} 1 & 3 & 0 & 2 \\ 0 & 0 & 1 & 4 \\ 0 & 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} 1\\ 0\\ 6\\ 0 \end{bmatrix}= \begin{bmatrix} 1\\ 6\\ 0 \end{bmatrix} \]

Thus, we can get two parts of the solution:

\(\boldsymbol{x}_{particular}\) that solves \(A\boldsymbol{x}_p=b\)
\(\boldsymbol{x}_{nullspace}\) (n-r) that solves \(A\boldsymbol{x}_n=0\)

The complete solution would be (one \(x_p\), many \(x_n\). Here the free column is 2nd and 4th):

\[ \boldsymbol{x} = \boldsymbol{x}_p + \boldsymbol{x}_n = \left[\begin{array}{c} 1 \\ 0 \\ 6 \\ 0 \end{array}\right] + x_2\left[\begin{array}{r} -3 \\ 1 \\ 0 \\ 0 \end{array}\right] + x_4\left[\begin{array}{r} -2 \\ 0 \\ -4 \\ 1 \end{array}\right] \]

Every matrix \(A\) with full column rank (r=n) has these properties:

All columns of \(A\) are pivot column
No free variables or special solutions
The nullspace \(N(A)\) contains only zero vector
If \(A\boldsymbol{x}=\boldsymbol{b}\) has a solution, then it has only one solution

Every matrix \(A\) with full row rank (r=m) has these properties:

All rows have pivots, and \(R\) has no zero rows
\(A\boldsymbol{x}=\boldsymbol{b}\) has a solution for every right side b
The column space is the whole space \(R^m\)
There are \(n-r=n-m\) special solutions in the nullspace of \(A\)

A brief summary:

3_2_rank

Independence, basis and dimension

Basis

Every vector in the space is a unique combination of the basis vectors. The true dimension is measured by counting independent columns.

Linear independence

The sequence of vectors \(\boldsymbol{v}_1, \dots, \boldsymbol{v}_n\) is linearly independent if and only if

\[ x_1\boldsymbol{v}_1 + x_2\boldsymbol{v}_2 + x_3\boldsymbol{v}_3 + \dots + x_n\boldsymbol{v}_n = \boldsymbol{0} \]

happens when all \(x\) are zero.

Vectors that span a subspace

A set of vectors spans a space if their linear combination fill the space.

Note:

The columns of a matrix span its column space. They might be dependent.
The row space of a matrix is the subspace of \(R^n\) spanned by the rows.
The row space of \(A\) is \(C(A^T)\). It is the column space of \(A^T\).

A basis for vector space

The basis vectors are linearly independent and they span the space.

Note:

The vectors \(v_1, v_2, \dots, v_n\) are basis for \(\mathbb{R}^n\) exactly when they are columns of an n by n invertible matrix.

Four subspaces

For any \(m\times n\) matrix \(A\), there exists four subspaces:

The row space is \(\boldsymbol{C}(A^T)\), a subspace of \(R^n\)
The column space is \(\boldsymbol{C}(A)\), a subspace of \(R^m\)
The nullspace is \(\boldsymbol{N}(A)\), a subspace of \(R^n\) (Reminder: solution of \(A\boldsymbol{x}=0\), \(dim=n-r\))
The left nullspace is \(\boldsymbol{N}(A^T)\), a subspace of \(R^m\) (Reminder: solution of \(A^T\boldsymbol{x}=0\), \(dim=m-r\))

3_3_4subspaces

Chapter4: Orthogonality

Orthogonality of 4 subspaces

Orthogonal vectors:

\[ \boldsymbol{v}^T\boldsymbol{w}=0 \quad and \quad \|\boldsymbol{v}\|^2+\|\boldsymbol{w}\|^2=\|\boldsymbol{v}+\boldsymbol{w}\|^2 \]

Orthogonal subpaces: Two subspaces \(V\) and \(W\) of a vector space are orthogonal if every vector \(v\) in \(V\) is perpendicular to every vector \(w\) in \(W\)

E.g. Floor of your room is a subspace \(V\), The line two walls meet is a subspace \(W\). Those subspaces are orthogonal.

E.g. Two walls look perpendicular, those subspaces are not orthogonal. Meeting line is in both \(V\) and \(W\), but it is not perpendicular to itself.

Every vector \(\boldsymbol{x}\) in nullspace is perpendicular to every row of \(A\) because \(A\boldsymbol{x}=\boldsymbol{0}\). The nullspace \(N(A)\) and row space \(C(A^T)\) are orthogonal subspaces of \(R^n\).

Proof: The vectors in row space are combinations \(A^\top \boldsymbol{y}\) of the rows. Take the dot product of \(A^\top \boldsymbol{y}\) with any \(x\) in nullspace:

\[ \boldsymbol{x}^\top (A^\top \boldsymbol{y}) = (A\boldsymbol{x})^\top \boldsymbol{y} = \boldsymbol{0}^\top \boldsymbol{y} = 0 \]

Every vector \(\boldsymbol{y}\) in nullspace of \(A^\top\) is perpendicular to column row of \(A\). The left nullspace \(N(A^\top)\) and row space \(C(A)\) are orthogonal in \(R^m\).

3_4_4subspace_orth

Orthogonal complements

Orthogonal complement of a subspace \(V\) contains every vector that is perpendicular to \(V\). This orthogonal subspace is denoted by \(V^{\perp}\) (V perp).

Note:

\(N(A)\) is the orthogonal complement of the row space \(C(A^T)\) (in \(R^n\)).
\(N(A^T)\) is the orthogonal complement of the column space \(C(A)\) in (\(R^m\))

As \(A\boldsymbol{x} = A\boldsymbol{x}_r + A\boldsymbol{x}_n=A\boldsymbol{x}_r\), we can find that the row space component goes to the column space:

3_4_4subspace_orth_2

Projections

Projection onto a line

We want to find a point \(\boldsymbol{p}\) that is closest to \(\boldsymbol{b}\) on \(\boldsymbol{a}\). The key to projection is orthogonality: The line from \(\boldsymbol{b}\) to \(\boldsymbol{p}\) is perpendicular to the vector \(\boldsymbol{a}\). The projection \(\boldsymbol{p}\) would be some multiple of \(\boldsymbol{a}\), i.e. \(\boldsymbol{p}=\hat{x} \boldsymbol{a}\). To solve the projection, we need 3 steps:

Find \(\hat{x}\)
Find vector \(\boldsymbol{p}\)
Find matrix \(P\)

Here projecting \(\boldsymbol{b}\) onto \(\boldsymbol{a}\) will have an error \(\boldsymbol{e}= \boldsymbol{b} - \hat{x} \boldsymbol{a}\), so we have

\[ \boldsymbol{a} \cdot \boldsymbol{e}= \boldsymbol{a} \cdot (\boldsymbol{b} - \hat{x} \boldsymbol{a}) = 0 \]

Thus, the projection of \(\boldsymbol{b}\) onto the line through \(\boldsymbol{a}\) is the vector \(\boldsymbol{p}=\hat{x}\boldsymbol{a}=\frac{\boldsymbol{a}^T\boldsymbol{b}}{\boldsymbol{a}^T\boldsymbol{a}}\boldsymbol{a}=P\boldsymbol{b}\). The projection matrix \(P=\frac{\boldsymbol{a}\boldsymbol{a}^T}{\boldsymbol{a}^T\boldsymbol{a}}\).

3_5_proj

Projection onto a subspace

We want to find the combination \(\boldsymbol{p} = \hat{x}_1\boldsymbol{a}_1+\dots \hat{x}_n\boldsymbol{a}_n\) closest to a given vector \(\boldsymbol{b}\). That is to say, we are projecting each \(\boldsymbol{b}\) in \(\mathbb{R}^m\) onto the subspace spanned by \(\boldsymbol{a}\)'s, i.e., column space of matrix \(A\).

Find \(\hat{\boldsymbol{x}}\)

\[ A^T(\boldsymbol{b} - A\hat{\boldsymbol{x}}) \quad\text{or} A^T A\hat{\boldsymbol{x}}=A^T\boldsymbol{b} \]
Find \(\boldsymbol{p}\)

\[ \boldsymbol{p} = A\hat{\boldsymbol{x}} = A(A^T A)^{-1} A^T \boldsymbol{b} \]
Find \(P\)

\[ P = A(A^T A)^{-1} A^T \]

\(A^TA\) is invertible if and only if \(A\) as linearly independent columns

When \(A\) has independent columns, \(A^TA\) is square, symmetric, and invertible.

Least squares approximations

Oftentimes, \(A\boldsymbol{x}=\boldsymbol{b}\) has no solution. The usual reason is too many equations (m>n). That is to say \(\boldsymbol{b}\) is outside that column space of \(A\), or \(A\boldsymbol{x}\) is in the subspace of \(\boldsymbol{b}\). Thus, we need project \(\boldsymbol{b}\) onto that subspace to minimize the error.

3_7_ls

Orthonormal bases and Gram-Schmidt

The vectors \(\boldsymbol{q}_1, \dots \boldsymbol{q}_n\) are orthonormal if

\[ \boldsymbol{q}_i^{\mathrm{T}} \boldsymbol{q}_j=\left\{\begin{array}{lll} 0 & \text { when } i \neq j & (\text { orthogonal vectors }) \\ 1 & \text { when } i=j & \left(\text { unit vectors: }\left\|\boldsymbol{q}_i\right\|=1\right) \end{array}\right. \]

Orthogonal matrix \(Q\): When \(Q\) is square, \(Q^TQ=I\) means that \(Q^T=Q^{-1}\).

The orthogonal matrix \(Q\) leaves the length unchanged, i.e., \(\|Q\boldsymbol{x}\|=\|\boldsymbol{x}\|\)

Gram-Schmidt process

Start with three independent vectors \(\boldsymbol{a},\boldsymbol{b},\boldsymbol{c}\) create three orthogonal vectors \(A,B,C\).

Choosing \(A=\boldsymbol{a}\).
Start with b and subtract its projection along \(A\)

\[ B = \boldsymbol{b}- \frac{A^T\boldsymbol{b}}{A^TA}A \]

Then c

\[ C = \boldsymbol{c}- \frac{A^T\boldsymbol{c}}{A^TA}A-\frac{B^T\boldsymbol{c}}{B^TB}B \]

Summary: Substract from every new vector its projection in the directions already set.

3_8_gs

QR factorization

The Gram-Schmidt process will create an orthogonal matrix \(Q\), and we can create a matrix \(R\) to make \(A=QR\). Let us start with a matrix\(A\) whose columns are \(\boldsymbol{a}, \boldsymbol{b}, \boldsymbol{c}\).

We need firstly create \(\boldsymbol{q}_1=\boldsymbol{a}/\|\boldsymbol{q}\|\). Then, construct the rest vectors using Gram-schmidt process:

3_6_qr

And we can also get \(R=Q^TA\)

By using QR factorization, we can simplify the least square equation \(A^TA\hat{x}=A^T b\):

\[ R^TR\hat{x} = R^TQ^Tb \rightarrow R\hat{x}=Q^Tb \]

Chapter 5: Determinants

Purpose of determinants

Determinant is used to solve linear equation. Consider the following case:

\[ \left\{\begin{array}{l} a_{11} x_1+a_{12} x_2=b_1 \\ a_{21} x_1+a_{22} x_2=b_2 \end{array}\right. \\ \]

Using Gaussian elimination, we could get:

\[ \left\{\begin{array}{l} x_1=\frac{b_1 a_{22}-a_{12} b_2}{a_{11} a_{22}-a_{12} a_{21}} \\ x_2=\frac{b_2 a_{11}-a_{21} b_1}{a_{11} a_{22}-a_{12} a_{21}} \end{array}\right. \]

We can rewrite the solution with determinant:

\[ \left\{\begin{array} { l } { x _ { 1 } = \frac { b _ { 1 } a _ { 2 2 } - a _ { 1 2 } b _ { 2 } } { a _ { 1 1 } a _ { 2 2 } - a _ { 1 2 } a _ { 2 1 } } } \\ { x _ { 2 } = \frac { b _ { 2 } a _ { 1 1 } - a _ { 2 1 } b _ { 1 } } { a _ { 1 1 } a _ { 2 2 } - a _ { 1 2 } a _ { 2 1 } } } \end{array} \Longleftrightarrow \left\{\begin{array}{l} x_1=\frac{\left|\begin{array}{ll} b_1 & a_{12} \\ b_2 & a_{22} \end{array}\right|}{\left|\begin{array}{ll} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array}\right|} \\ x_2=\frac{\left|\begin{array}{ll} a_{11} & b_1 \\ a_{21} & b_2 \end{array}\right|}{\left|\begin{array}{ll} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array}\right|} \end{array}\right.\right. \]

Consider the following third-order determinants:

\[ \left|\begin{array}{lll} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array}\right|=a_{11} a_{22} a_{33}+a_{12} a_{23} a_{31}+a_{13} a_{21} a_{32}-a_{11} a_{23} a_{32}-a_{12} a_{21} a_{33}-a_{13} a_{22} a_{31} \]

We can find that for subscript, the first term is 1, 2, 3 while the second term is the combination of 1, 2, 3. The sign is determined by inversion number. Consider following case:

3_9_inv

The inversion number is the sum of all \(t_i\). In this case, the inversion number \(t=6\). The general rule for calculating n-order determinant is as follows:

\[ \begin{gathered} D=\left|\begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1 n} \\ a_{21} & a_{22} & \cdots & a_{2 n} \\ \vdots & \vdots & & \vdots \\ a_{n 1} & a_{n 2} & \cdots & a_{n n} \end{array}\right| \\ \\ D=\sum(-1)^t a_{1 p_1} a_{2 p_2} \cdots a_{n p_n} \end{gathered} \]

where \(t\) is the inversion number of \(a_{1 p_1}, a_{2 p_2}, \cdots, a_{n p_n}\).

Again, remember that determinant is used to solve linear equations. Consider the following cases:

\[ \begin{gathered} x_1=\frac{\left|\begin{array}{ll} b_1 & a_{12} \\ b_2 & a_{22} \end{array}\right|}{\left|\begin{array}{ll} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array}\right|} x_2=\frac{\left|\begin{array}{ll} a_{11} & b_1 \\ a_{21} & b_2 \end{array}\right|}{\left|\begin{array}{ll} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array}\right|} \\ \\ x_1=\frac{\left|\begin{array}{lll} b_1 & a_{12} & a_{13} \\ b_2 & a_{22} & a_{23} \\ b_3 & a_{32} & a_{33} \end{array}\right|}{\left|\begin{array}{lll} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array}\right|} x_2=\frac{\left|\begin{array}{lll} a_{11} & b_1 & a_{13} \\ a_{21} & b_2 & a_{23} \\ a_{31} & b_3 & a_{33} \end{array}\right|}{\left|\begin{array}{lll} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array}\right|} x_3=\frac{\left|\begin{array}{lll} a_{11} & a_{12} & b_1 \\ a_{21} & a_{22} & b_2 \\ a_{31} & a_{32} & b_3 \end{array}\right|}{\left|\begin{array}{lll} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array}\right|} \end{gathered} \]

This is called Cramer's rule. Consider the case \(A\boldsymbol{x}=\boldsymbol{b}\):

\[ |A|=\left|\begin{array}{ccc} a_{11} & \cdots & a_{1 n} \\ \vdots & & \vdots \\ a_{n 1} & \cdots & a_{n n} \end{array}\right| \neq 0 \]

Then, we could solve the equation:

\[ x_1=\frac{\left|A_1\right|}{|A|} \quad x_2=\frac{\left|A_2\right|}{|A|} \quad \cdots \quad x_n=\frac{\left|A_n\right|}{|A|} \]

where

\[ A_j=\left(\begin{array}{ccccccc} a_{11} & \cdots & a_{1, j-1} & b_1 & a_{1, j+1} & \cdots & a_{1 n} \\ \vdots & & \vdots & \vdots & \vdots & & \vdots \\ a_{n 1} & \cdots & a_{n, j-1} & b_n & a_{n, j+1} & \cdots & a_{n n} \end{array}\right) \]

Properties of determinants

Singular matrix has zero determinant
The product of the pivots is the determinant
The determinant changes sign when two rows (or two columns) are exchanged
The determinant is a linear function of each row separately (First row multiply by t, determinant is also multiplied by t. First row is added, determinant is also added)
Two rows of A are equal, then \(detA=0\)
Substract a multiple of one row from another row leaves \(det A\) unchanged
A matrix with a row of zeros has \(detA=0\)
The determinant of \(AB\) is \(detA\) times \(detB\)

Cramer's rule & Volume

This is for solving \(A\boldsymbol{x}=\boldsymbol{b}\)

Step1:

\[ \left[\begin{array}{ccc} &&\\ &A&\\ &&\end{array}\right] \left[\begin{array}{lll} \boldsymbol{x}_{\mathbf{1}} & 0 & 0 \\ \boldsymbol{x}_{\mathbf{2}} & 1 & 0 \\ \boldsymbol{x}_{\mathbf{3}} & 0 & 1 \end{array}\right] =\left[\begin{array}{lll} \boldsymbol{b}_{\mathbf{1}} & a_{12} & a_{13} \\ \boldsymbol{b}_{\mathbf{2}} & a_{22} & a_{23} \\ \boldsymbol{b}_{\mathbf{3}} & a_{32} & a_{33} \end{array}\right]=B_1 \]

Using product rule: \((detA)(x_1)=det B_1\)

Step2:

\[ \left[\begin{array}{lll} &&\\ \boldsymbol{a}_1 & \boldsymbol{a}_2 & \boldsymbol{a}_3\\ && \end{array}\right]\left[\begin{array}{lll} 1 & x_1 & 0 \\ 0 & x_2 & 0 \\ 0 & x_3 & 1 \end{array}\right]=\left[\begin{array}{lll} \boldsymbol{a}_1 & \boldsymbol{b} & \boldsymbol{a}_3 \\ & & \end{array}\right]=B_2 \]

Using product rule: \((det A)(x_2)=det B_2\)

A brief summary: Use product to solve \(A\boldsymbol{x}=\boldsymbol{b}\), The matrix \(B_j\) has the \(j^{th}\) column of \(A\) replaced by vector \(\boldsymbol{b}\).

Area of a triangle

The area of a triangle is half of a 3 by 3 determinant

\[ \frac{1}{2}\left|\begin{array}{lll} x_1 & y_1 & 1 \\ x_2 & y_2 & 1 \\ x_3 & y_3 & 1 \end{array}\right| \]

Cross product

\[ u \times v=\left|\begin{array}{ccc} i & j & k \\ u_1 & u_2 & u_3 \\ v_1 & v_2 & v_3 \end{array}\right|=\left(u_2 v_3-u_3 v_2\right) \boldsymbol{i}+\left(u_3 v_1-u_1 v_3\right) \boldsymbol{j}+\left(u_1 v_2-u_2 v_1\right) \boldsymbol{k} \]

Properties:

\(\boldsymbol{v} \times \boldsymbol{u}\) reverses rows 2 and 3 in the determinant so it equals \(-(\boldsymbol{u}\times \boldsymbol{v})\)
The cross product \(\boldsymbol{u} \times \boldsymbol{v}\) is perpendicular to \(\boldsymbol{u}\) and also \(\boldsymbol{v}\)
The cross product of any vector with itself is \(\boldsymbol{u} \times \boldsymbol{u}=0\)
The length of \(\boldsymbol{u}\times \boldsymbol{v}\) equals the area of the parallelogram with sides \(\boldsymbol{u}\) and \(\boldsymbol{v}\). i.e. \(\|u\| \|v\| |\sin\theta|\)

Triple product = Determinant = Volume

Triple product:

\[ (\boldsymbol{u} \times \boldsymbol{v}) \cdot \boldsymbol{w}=\left|\begin{array}{lll} w_1 & w_2 & w_3 \\ u_1 & u_2 & u_3 \\ v_1 & v_2 & v_3 \end{array}\right|=\left|\begin{array}{ccc} u_1 & u_2 & u_3 \\ v_1 & v_2 & v_3 \\ w_1 & w_2 & w_3 \end{array}\right| \]

Chapter 6: Eigenvalues and Eigenvectors

Introduction to Eigenvalues

The basic equation for eigenvalues and eigenvectors is:

\[ A\boldsymbol{x}=\lambda \boldsymbol{x} \]

The eigenvectors make up the nullspace of \(A-\lambda I\):

\[ det(A-\lambda I)=0 \]

Geometric interpretation

For

\[ A = \begin{bmatrix} 0.8 & 0.3 \\ 0.2 & 0.7 \end{bmatrix} \]

we know its eigenvalues are \(\lambda= 1\) and \(\lambda= 1/2\) and the eigenvectors are \((0.6, 0.4)^T\) and \((0.5, -0.5)^T\) respectively. We can find that

\[ A^{100} \approx \begin{bmatrix} 0.6 & 0.6 \\ 0.4 & 0.4 \end{bmatrix} \]

Determinant and trace

The product of n eigenvalues equals the determinant
The sum of n eigenvalues equals the sum of the n diagonal entries (trace)

Imaginary eigenvalues

Consider a rotation matrix:

\[ Q = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} \]

Its eigenvalues are \(\lambda=\pm i\). (Think about it, no real vectors \(\mathbf{x}\) stays in the original direction as it is a rotation matrix)

Diagonalizing a matrix

Suppose the n by n matrix \(A\) has n linearly independent eigenvectors \(\mathbf{x}_1, \dots, \mathbf{x}_n\). Put them into the columns of an eigenvector matrix \(X\):

\[ X^{-1} A X=\Lambda=\left[\begin{array}{lll} \lambda_1 & & \\ & \ddots & \\ & & \lambda_n \end{array}\right] \]

Then,

\[ A^2 = X\Lambda X^{-1} X\Lambda X^{-1} = X\Lambda^{2} X^{-1} \]

That is to say \(A^2\) has the same eigenvectors in \(X\) and squared eigenvalues in \(\Lambda^{2}\).

Note:

Suppose eigenvalues \(\lambda_1, \dots, \lambda_n\) are all different, then it is automatic that the eigenvectors \(\mathbf{x}_1, \dots, \mathbf{x}_n\) are independent. Any matrix that has no repeated eigenvalues can be diagonalized

Similar matrices: Same Eigenvalues

All matrices \(A = BCB^{-1}\) that allows all invertible \(B\) are similar as they all share eigenvalues of \(C\)

Matrix Powers \(A^k\)

Consider a typical difference equation \(\mathbf{u}_{k+1}=Au_{k}\), the solution:

\[ \mathbf{u}_k = A^k \mathbf{u}_0 = X\Lambda^k X^{-1} \mathbf{u}_0 \]

If we represent \(\mathbf{u}_0\) with eigenvectors (\(\mathbf{u}_0=c_1 \mathbf{x}_1 + \dots + c_n \mathbf{x}_n\)), the solution can be written as:

\[ \mathbf{u}_k = A^k \mathbf{u}_0 = c_1(\lambda_1)^k\mathbf{x}_1 + \dots + c_n(\lambda_n)^k\mathbf{x}_n \]

System of differential equations

The ordinary differential equation \(\frac{du}{dt}=u\) and \(\frac{du}{dt}=\lambda u\) are solved by exponentials:

\[ \frac{du}{dt}=u \Rightarrow u(t) =C e^t ,\quad \frac{du}{dt}=\lambda u \Rightarrow u(t) =C e^{\lambda t} \]

The value of \(C\) is solved by initial value \(u(0)=C\).

Let us consider a system of n equations:

\[ \frac{du}{dt} = A\mathbf{u} \]

starting from the vector \(\mathbf{u}(0) = [u_1(0),\dots, u_n(0)]^T\) at \(t=0\)

The solution would be in the form of \(e^{\lambda t}\mathbf{x}\), where \(\lambda\) and \(\mathbf{x}\) are eigenvalues and eigenvectors of \(A\). Substitute \(\mathbf{u}(t) = e^{\lambda t}\mathbf{x}\) into the original equation:

\[ \frac{d\mathbf{u}}{dt} = \lambda e^{\lambda t}\mathbf{x} = A\mathbf{u} = A\mathbf e^{\lambda t}\mathbf{x} \]

We can obtain \(A\mathbf{x}=\lambda\mathbf{x}\). This proves we are right.

Consider an example:

\[ \frac{d\mathbf{u}}{dt}= A\mathbf{u} = \begin{bmatrix}0 & 1\\ 1 & 0\end{bmatrix}\mathbf{u} \quad \text{starting from} \quad \mathbf{u}(0) = \begin{bmatrix}4\\ 2\end{bmatrix} \]

The matrix \(A\) has eigenvalues \(\lambda=\pm 1\) and eigenvectors \((1, 1)\) and \((1, -1)\) respectively. The solution would be:

\[ \mathbf{u}_1(t) = e^{\lambda_1 t} \mathbf{x}_1 = e^{t}\begin{bmatrix}1\\ 1\end{bmatrix} \quad \text{and} \quad \mathbf{u}_2(t) = e^{\lambda_2 t} \mathbf{x}_2 = e^{-t}\begin{bmatrix}1\\ -1\end{bmatrix} \]

The complete solution is:

\[ \mathbf{u}(t) = Ce^{t}\begin{bmatrix}1\\ 1\end{bmatrix} + De^{-t}\begin{bmatrix}1\\ -1\end{bmatrix} \]

Using the initial value \(\mathbf{u}(0)\) yield \(C=3\) and \(D=1\).

Second order equations

Consider \(my''+by' + ky = 0\), the solution is to substitute \(y=e^{\lambda t}\):

\[ m \frac{d^2y}{dt^2} + b \frac{dy}{dt} + ky = 0\quad \text{becomes} \quad (m\lambda^2 + b\lambda + k)e^{\lambda t} = 0 \]

We can find two roots of \(m\lambda^2 + b\lambda + k=0\) and solve the equation.

Let us suppose \(m=1\), we could write follow linear equation:

\[ \frac{d}{dt}\begin{bmatrix}y \\ y'\end{bmatrix} = \begin{bmatrix}0 & 1 \\ -k & -b\end{bmatrix} \begin{bmatrix}y \\ y'\end{bmatrix} = A\mathbf{u} \]

Then, we could solve this equation:

\[ \mathbf{u}(t) = c_1 e^{\lambda_1 t}\begin{bmatrix}1\\ \lambda_1\end{bmatrix} + c_2 e^{\lambda_2 t}\begin{bmatrix}1\\ \lambda_2\end{bmatrix} \]

Stability of 2 by 2 matrices

When solution of \(du/dt=A\mathbf{u}\) approach 0 as \(t\rightarrow \infty\), we say this problem is stable. Since the solution of \(\mathbf{u}(t)\) built from \(e^{\lambda y}\mathbf{x}\), we know that to make it become stable, \(\lambda\) must be negative.

The question is: Which matrices have negative eigenvalues (real parts)?

\(A\) is stable and \(\mathbf{u}\rightarrow \mathbf{0}\) when all \(\lambda\) have negative real parts. Then, \(A=\begin{bmatrix}a &b\\ c&d\end{bmatrix}\) must pass two tests:

\(\lambda_1 + \lambda_2 < 0 \Rightarrow trace = a + d <0\)
\(\lambda_1\lambda_2 < 0 \Rightarrow determinant = ad -bc >0\)

The exponential of a Matrix

We want solution \(\mathbf{u}(t)\) in a new form \(e^{At}\mathbf{u}(0)\). Now what does \(e^{At}\) means ?

We know \(e^x = 1 + x + \frac{1}{2}x^2 + \frac{1}{6}x^3+\dots\), when we change \(x\) to \(At\), this series defines the matrix exponential \(e^{At}\):

\[ e^{At} = 1 + {At} + \frac{1}{2}({At})^2 + \frac{1}{6}({At})^3+\dots \]

And its derivative \(Ae^{At}\):

\[ Ae^{At} = A + A^2t + \frac{1}{2}A^3t^2 + \dots \]

And its eigenvalue \(e^{\lambda t}\):

\[ (1 + At + \frac{1}{2}({At})^2 + \dots) x = (1 + \lambda t + \frac{1}{2}({\lambda t})^2 + \dots) x \]

Find \(u(t)=e^{At}\mathbf{u}(0)\) by diagonalization

Assume A has n independent eigenvectors, so it is diagonalizable. Substitute \(A=X\Lambda X^{-1}\) into the series for \(e^{At}\):

\[ \begin{aligned} e^{At} &= I + X\Lambda X^{-1}t + \frac{1}{2} (X\Lambda X^{-1}t)(X\Lambda X^{-1}t)+\dots\\ &= X[I + \Lambda t + \frac{1}{2}(\Lambda t)^2 + \dots ] X^{-1}\\ &= X e^{\Lambda t} X^{-1} \end{aligned} \]

Then, we get

\[ e^{A t} \boldsymbol{u}(0)=X e^{\Lambda t} X^{-1} \boldsymbol{u}(0)=\left[\begin{array}{lll} & &\\ x_1 & \cdots & \boldsymbol{x}_n \\ & & \end{array}\right]\left[\begin{array}{lll} e^{\lambda_1 t} & & \\ & \ddots & \\ & & e^{\lambda_n t} \end{array}\right]\left[\begin{array}{c} c_1 \\ \vdots \\ c_n \end{array}\right] \]

We can obtain the solution using three steps (Similar to previous one):

\(\mathbf{u}(0)= c_1\mathbf{x}_1 + \dots + c_n\mathbf{x}_n = X\mathbf{c}\)
Multiply each \(\mathbf{x}_i\) by its growth factor \(e^{\lambda_i t}\)
We obtain \(\mathbf{u}(t)= c_1 e^{\lambda_1 t}\mathbf{x}_1 + \dots + c_n e^{\lambda_n t}\mathbf{x}_n\)

Symmetric matrices

For symmetric matrices, we know

\[ S^T = (X^{-1})^T \Lambda X^T = (X)^T \Lambda X^{-1} = S \]

That means \(X^T X= I\), i.e. each eigenvector in \(X\) is orthogonal to othher eigenvectors when \(S=S^T\). Here are key facts:

A symmetric matrix has only real eigenvalues
The eigenvectors can be chosen orthonormal, then the eigenvector matrix \(X\) becomes an orthogonal matrix \(Q\)

\[ S=Q \Lambda Q^{-1}=Q \Lambda Q^{\mathrm{T}} \quad \text { with } \quad Q^{-1}=Q^{\mathrm{T}} \]

Complex Eigenvalues of Real Matrices

For real matrices, complex \(\lambda\)'s and \(x\)'s come in "conjugate pairs":

\[ \begin{aligned} & \lambda=a+i b \\ & \bar{\lambda}=a-i b \end{aligned} \quad \text { If } \quad A x=\lambda x \quad \text { then } \quad A \bar{x}=\bar{\lambda} \bar{x} . \]

For instance, \(A = \begin{bmatrix}\cos\theta & -\sin\theta\\ \sin\theta & \cos\theta\end{bmatrix}\) has \(\lambda_1 = \cos\theta + i\sin\theta\) and \(\lambda_2 = \cos\theta - i\sin\theta\). The eigenvalues are conjugate to each other and eigenvectors are \((1, i)^T\) and \((1, -i)^T\).

NOTE: For this rotation matrix (or every orthogonal matrix \(Q\)), we have \(|\lambda|=1\)

Eigenvalues versus pivots

product of pivots = determinant = product of eigenvalues
For symmetric matrices, #positive eigenvalues = #positive pivots

Positive definite matrices

For Symmetric matrix that have all positive eigenvalues, we call it positive definite matrices.

\[ A = \left[\begin{array}{cc} a&b\\ c&d \end{array}\right] \]

We can conduct an easy test by \(a>0\) and \(ac-b^2>0\). Note that positive eigenvalues mean positivve pivots and vice versa.

Energy-based definition

From \(S\mathbf{x}=\lambda\mathbf{x}\), we can get

\[ \mathbf{x}^T S \mathbf{x} = \lambda \mathbf{x}^T\mathbf{x} = \lambda\|x\|^2 \quad \text{since all} \quad \lambda > 0 \]

Thus, we know that \(\mathbf{x}^T S\mathbf{x}\) is positive for all nonzero vectors \(\mathbf{x}\) (\(\mathbf{x}^T S\mathbf{x}\) is usually called energy in many applications).

This definition leads to if \(S\) and \(T\) are symmetric positive definite, so is \(S+T\).

If the columns of \(A\) are independent, then \(S=A^T A\) is positive definite:

\[ \mathbf{x}^T S\mathbf{x} = \mathbf{x}^T A^T A\mathbf{x} = (A\mathbf{x})^T(A\mathbf{x}) = \|A\mathbf{x}\|^2 \]

Now we have a conclusion when a symmetric matrix \(S\) has one of these five properties, it has them all:

All n pivots of S are positive
All n upper left determinants are positive
All n eigenvalues of S are positive
\(\mathbf{x}^T S\mathbf{x}\) is positive except at \(\mathbf{x}=0\). This is energy-based definition
S equals \(A^TA\) for a matrix \(A\) with independent columns

Positive semi-definite matrices

When the determinant is zero and the smallest eigenvalue is zero, the energy in its eigenvector is:

\[ \mathbf{x}^T S \mathbf{x} = \mathbf{x}^T \mathbf{0} \mathbf{x} = 0 \]

These matries are called positive semi-definite and can be factors into \(A^TA\) with dependent columns in \(A\).

Positive semi-deinite matrices have all \(\lambda \geq 0\) and all \(\lambda \mathbf{x}^T S \mathbf{x} 0\).

The ellipse \(ax^2 + 2bxy + cy^2=1\)

Consider this tilted ellipse \(5x^2 + 8xy + 5y^2=1\), we can rewrite it into the form of \(\mathbf{x}^TS\mathbf{x}\):

\[ \begin{bmatrix} x & y \end{bmatrix} \begin{bmatrix} 5 & 4\\ 4 & 5 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = 1 \quad\text{where} \quad S=\begin{bmatrix} 5 & 4\\ 4 & 5 \end{bmatrix} \]

We can factor \(S=Q\Lambda Q^T\):

\[ \left[\begin{array}{ll} 5 & 4 \\ 4 & 5 \end{array}\right]=\frac{1}{\sqrt{2}}\left[\begin{array}{rr} 1 & 1 \\ 1 & -1 \end{array}\right]\left[\begin{array}{ll} \mathbf{9} & 0 \\ 0 & 1 \end{array}\right] \frac{1}{\sqrt{2}}\left[\begin{array}{rr} 1 & 1 \\ 1 & -1 \end{array}\right] \]

We can also rewrite: \(\mathbf{x}^TS\mathbf{x} =(\mathbf{x}^T Q )\Lambda (Q^T\mathbf{x})=X^T\Lambda X\).

The figure of \(\mathbf{x}^TS\mathbf{x}\) and \(X^T\Lambda X\) are as follows:

ellipse

The axes of the tilted ellipse (ellipse) point along eigenvectors of \(S\) (\(\Lambda\)). This explains why \(S=Q\Lambda Q^T\) called Principal axis theorem as it displays the axes: axis direction (from eigenvectors) + axis lengths (from eigenvalues)

Important application: test for a minimum

F(x, y) has a minimum if \(\partial F / \partial x = \partial F / \partial y=0\) and \(S=\begin{bmatrix}\partial^2 F / \partial x^2 & \partial^2 F / \partial x \partial y \\\partial^2 F / \partial y \partial x & \partial^2 F / \partial y^2 \end{bmatrix}\) (Hessian) is positive definite.

Chapter 7: Singular value decomposition (SVD)

Eigenvectors for SVD

The singular value decomposition theorem for \(A\) is eigenvalue theorem for \(A^TA\) and \(AA^T\). For matrix \(A\in \mathbb{R}^{m\times n}\), it has two sets of singular vectors (the eigenvectors of \(A^TA\) and \(AA^T\)) and one set of positive singular values (because \(A^TA\) has the same positive eigenvalues as \(AA^T\) as they are symmetric positive semi-definite)

Motivation: Find orthogonal basis for image \(A\)

Difficulty: Eigenvectors of most images are not orthogonal, and we want two set of basis

Solution: Use the eigenvectors \(\boldsymbol{u}\in \mathbb{R}^m\) of \(AA^T\) and the eigenvectors \(\boldsymbol{v}\in \mathbb{R}^n\) of \(A^TA\). \(\boldsymbol{u}\) is called left singular vector, \(\boldsymbol{v}\) is called right singular vector, \(\sigma\) are singluar values. (square root of eigenvalues of \(AA^T\) and \(A^TA\)).

\[ A A^{\mathrm{T}} \boldsymbol{u}_i=\sigma_i^2 \boldsymbol{u}_i \quad A^{\mathrm{T}} A \boldsymbol{v}_i=\sigma_i^2 \boldsymbol{v}_i \quad A \boldsymbol{v}_i=\sigma_i \boldsymbol{u}_i \]

Bases and matrices in the SVD

We have two sets of singular vectors \(\boldsymbol{u} \in R^m\) and \(\boldsymbol{v}\in R^n\). They give bases for four subspaces:

\(\boldsymbol{u}_1,\dots,\boldsymbol{u}_r\) is an orthonormal basis for the column space
\(\boldsymbol{u}_{r+1},\dots,\boldsymbol{u}_{m}\) is an orthonormal basis for left nullspace \(N(A^T)\)
\(\boldsymbol{v}_1,\dots,\boldsymbol{v}_r\) is an orthonormal basis for the row space
\(\boldsymbol{v}_{r+1},\dots,\boldsymbol{v}_{m}\) is an orthonormal basis for nullspace \(N(A)\)

We can re-write the formula of SVD using matrices \(AV=U\Sigma\):

\[ A\left[v_1 \cdots v_r \cdot v_n\right]=\left[u_1 \cdots u_r \cdots u_m\right]\begin{bmatrix} \sigma_1 & & & \\ & \ddots & & \\ & & \sigma_r & \\ & & & & \end{bmatrix} \]

Given \(V^{-1} = V^T\), \(AV=U\Sigma\) becomes \(A=U\Sigma V^T\):

\[ A=U\Sigma V^T = \mathbf{u}_1\sigma_1\mathbf{v}_1^T + \dots + \mathbf{u}_r\sigma_r\mathbf{v}_r^T \]

Proof of the SVD

\[ A^T A = (U\Sigma V^T)^T (U\Sigma V^T) = V\Sigma^T U^T U\Sigma V^T = V\Sigma^T\Sigma V^T \]

On the right, \(V\) is the eigenvector matrix for \(A^TA\) and \(\Sigma^T\Sigma\) must be eigenvalue matrix of \(A^TA\), so each \(\sigma^2\) is \(\lambda(A^TA)\).

Singular vectors of \(A\) and eigenvectors of \(S=A^TA\)

\[ \begin{array}{rll} \text { Symmetric matrix } & S=Q \Lambda Q^{\mathrm{T}} & =\lambda_1 \boldsymbol{q}_1 \boldsymbol{q}_1^{\mathrm{T}}+\lambda_2 \boldsymbol{q}_2 \boldsymbol{q}_2^{\mathrm{T}}+\cdots+\lambda_r \boldsymbol{q}_r \boldsymbol{q}_r^{\mathrm{T}} \\ \text { Any matrix } & A=U \Sigma V^{\mathrm{T}} & =\sigma_1 \boldsymbol{u}_1 \boldsymbol{v}_1^{\mathrm{T}}+\sigma_2 \boldsymbol{u}_2 \boldsymbol{v}_2^{\mathrm{T}}+\cdots+\sigma_r \boldsymbol{u}_r \boldsymbol{v}_r^{\mathrm{T}} \end{array} \]

Start with the largest \(\lambda_1\), we are actually solving this problem (Recall \(S\mathbf{x} = \lambda\mathbf{x}\)):

\[ \lambda_1 = \text{maximum ratio}\quad \frac{\mathbf{x}^T S \mathbf{x}}{\mathbf{x}^T \mathbf{x}} \]

The wining vector is \(\mathbf{x} = \mathbf{q}_1\) with \(S\mathbf{q}_1 = \lambda_1\mathbf{q}_1\). The largest singlue value \(\sigma_1\) of \(A\) is solving:

\[ \sigma_1 = \text{maximum ratio}\quad \frac{\mathbf{x}^T A^T A \mathbf{x}}{\mathbf{x}^T \mathbf{x}} = \frac{\|A\mathbf{x}\|}{\|\mathbf{x}\|} \]

The wining vector is \(\mathbf{x} = \mathbf{v}_1\) with \(A\mathbf{v}_1 = \sigma_1\mathbf{u}_1\).

Principal Component Analysis (PCA by the SVD)

Let us start with a matrix \(A_0\in \mathbb{R}^{m\times n}\). We first substract the average of each row to reach \(A\). Let us assume m=2 and visualize those points:

PCA

We can find these re-centered point cluster along a line, but how to find that line?

We need firstly find the sample covariance matrix defined by:

\[ S = \frac{AA^T}{n-1} \]

where

\(A\) shows the distance \(a_{ij}-\mu_i\) from each measurement to row average
\((AA^T)_{11}\) and \((AA^T)_{22}\) shows the sum of squared distances (sample variance \(s_1^2\), \(s_2^2\))
\((AA^T)_{12}\) shows the sample covariance \(s_{12}= (\text{row 1 of A})\cdot(\text{row 2 of A})\)

Note:

High covariance means: Strong relationship
Why divide by \(n-1\) instead of \(n\)? - one degree of freedom has been used by the mean, leaving \(n-1\)... just agree with it...

Consider this matrix \(A\):

\[ A = \begin{bmatrix} 3 & -4 & 7 & 1 & -4 & -3\\ 7 & -6 & 8 & -1 & -1 & -7 \end{bmatrix}\quad \text{has sample covariance}\quad S = \frac{AA^T}{5} = \begin{bmatrix} 20 & 25\\ 25 & 40 \end{bmatrix} \]

The eigenvalue of \(S\) are near 57 and 3. The first rank one piece \(\sqrt{57}\mathbf{u}_1\mathbf{v}_1^T\) is much larger than second piece \(\sqrt{3}\mathbf{u}_2\mathbf{v}_2^T\). The leading eigenvector \(\mathbf{u}_1\approx (0.6, 0.8)\) shows the direction in the scatter graph above.

The essentials of Principal Component Analysis (PCA)

The total variance \(T\) is the sum of all eigenvalues and of sample variances \(s^2\)

\[ T = \sigma_1^2 + \dots + \sigma_m^2 = s_1^2 + \dots + s_m^2 = \mathrm{trace}(S) \]
We can get the largest \(R\) singular values and corresponding directions. The n data points will be near an \(R\)-dimenisonal subspace with basis \(\mathbf{u}_{1}\) to \(\mathbf{u}_{r}\). These \(\mathbf{u}\)'s are the principal components in \(m\)-dimensional space.

Perpendicular least squares

The best line \(\mathbf{u}_1\) in above also solves the problem of perpendicular least squares:

The sum of squared distances from the point to the line is a minimum.

Proof: Separate each column \(\mathbf{a}_j\) into its components along the \(\mathbf{u}_1\) line and \(\mathbf{u}_2\) (Project onto \(\mathbf{u}_1\) and \(\mathbf{u}_2\)):

\[ \sum_{j=1}^n \|\mathbf{a}_j\|^2 = \sum_{j=1}^n |\mathbf{a}_j^T\mathbf{u}_1|^2 + \sum_{j=1}^n |\mathbf{a}_j^T\mathbf{u}_2|^2 \]

The sum on the left is fixed by the data points \(\mathbf{a}_j\) (columns of \(A\)). The first sum on the right is \(\mathbf{u}_1^T A A^T\mathbf{u}_1\). So when we maximize that sum by choosing eigenvector \(\mathbf{u}_1\), we minimize the second sum, the squared distance from the data points to the best line (Because of the Orthonormal basis). That second sum will be the minimum for perpendicular least square!!!

The Geometry of the SVD

SVD separates a matrix into \((\mathrm{orthogonal})\times (\mathrm{diagonal}) \times (\mathrm{orthogonal})\) - \((\mathrm{rotation})\times (\mathrm{stretching}) \times (\mathrm{rotation})\)

GeometrySVD

The norm of a matrix

The largest singular value \(\sigma_1\) is the norm of the matrix \(A\):

The norm \(\|A\|\) is the largest ratio

\[ \frac{\|A\mathbf{x}\|}{\|\mathbf{x}\|} \quad \|A\| = \max_{x \neq 0} \frac{\|A\mathbf{x}\|}{\|\mathbf{x}\|} = \sigma_1 \]

Polar decomposition \(A=QS\)

Every real square matrix can be factored into \(A=QS\), where \(Q\) is orthogonal and \(S\) is symmetric positivve semi-definite. If \(A\) is invertible, \(S\) is positive definite.

\[ A = U\Sigma V^T = (UV^T)(V\Sigma V^T) = (Q) (S) \]

Polar decomposition separate the rotation (in \(Q\)) from the stretching (in \(S\)). NOTE: \(Q=UV^T\) is the nearest orthogonal matrix to \(A\). This \(Q\) makes the norm \(\|Q-A\|\) as small as possible. This corresponds to the fact that \(e^{i\theta}\) is the nearest number on the unit circle to \(re^{i\theta}\).

Pseudoinverse \(A^\dagger\)

Pseudo inverse

\[ A^\dagger= V\Sigma^\dagger U^T= \left[v_1 \cdots v_r \cdot v_n\right] \begin{bmatrix} \sigma_1^{-1} & & & \\ & \ddots & & \\ & & \sigma_r^{-1} & \\ & & & & \end{bmatrix} \left[u_1 \cdots u_r \cdots u_m\right]^T \]

Least Square with Dependent Columns

\(A^T A\) might be singluar so \(A^TA\mathbf{x}=A^T\mathbf{b}\) might have infinitely many solutions. Pseudoinverse gives us a way to choose a beter solution \(\mathbf{x}^\dagger = A^\dagger\mathbf{b}\).

Reference

Gilbert Strang: Introduction to linear algebra. Wellesley-Cambridge Press, 2022.