Linear Algebra
Chapter1: Introduction to vectors
Linear combinations
Important concepts:
- The combination \(c\boldsymbol{v}\) fill a line through \((0,0,0)\).
- The combination \(c \boldsymbol{v}+d \boldsymbol{w}\) fill a plane through \((0,0,0)\).
- The combination of \(c \boldsymbol{v}+d \boldsymbol{w} + e \boldsymbol{u}\) fill 3d space, where \(\boldsymbol{u}\) is a 3-d vector that is not in the plannne of \(\boldsymbol{u}\) and \(\boldsymbol{v}\).
Angles and dot products
The dot product reveals te exact angle \(\theta\) between unit vectors \(\boldsymbol{u}\) and \(\boldsymbol{U}\)
Right angles: \(\boldsymbol{u} \cdot \boldsymbol{U} = 0\) when \(\boldsymbol{u}\) is perpendicular to \(\boldsymbol{U}\)
Cosine formula: \(\frac{\boldsymbol{v} \cdot \boldsymbol{w}}{\|\boldsymbol{v}\| \|\boldsymbol{w}\|} = \cos\theta\)
Schowarz inequality: \(\|\boldsymbol{v} \cdot \boldsymbol{w}\|\leq \|\boldsymbol{v}\| \|\boldsymbol{w}\|\)
Triangle inequality: \(\|\boldsymbol{v} + \boldsymbol{w}\|\leq \|\boldsymbol{v}\| + \|\boldsymbol{w}\|\)
Matrices & linear equations
Matrix times vector \(A\boldsymbol{x}=\) combination of the columns of \(A\)
Solution to \(A\boldsymbol{x}=b\) is \(\boldsymbol{x}=A^{-1}b\) when A is invertible
Chapter2: Solving linear equations
Vectors and linear equation
Two equations two unknowns
Row pictures: Two lines meeting at a single point
Column pictures: Combine te column vectors to produce vector \(\boldsymbol{b}\)
Matrix equation: \(A\boldsymbol{x}=\boldsymbol{b}\)
Elimination
Process of elimination:
Visualization of elimination:
Pivots: First non-zero in the row that does the elimination
Non-singular matrix: Full set of pivots after elimination, one solution
Singular: Full set of pivots after elimination, no solution or infinitely solutions (This depends on whether the equation contradicts to each other)
Elimination from \(A\) to upper triangle matrix \(U\)
Column 1: Use the first equation to create zeros below the first pivot
Column 2: Use the new equation 2 to create zeros below the second pivot
Column 3 to n: Keep going to find all n pivots and the upper triangular \(U\)
Augmented matrix
Rules for matrix operations
The computational complexity of matrix multiplication is \(\mathcal{O}(n^{3})\)
A row times a column is inner product
A column times a row is outer product
Inverse matrices
Two-sided inverse:
Notes:
- Inverse exists if and only if elimination produces n pivots
- If A is invertible, the only solution to \(A\boldsymbol{x}=\boldsymbol{b}\) is \(\boldsymbol{x}=A^{-1}\boldsymbol{b}\)
- If there exist non-zero vector \(\boldsymbol{x}\) such that \(A\boldsymbol{x}=0\) , then \(A\) cannot have inverse. No matrix can bring \(\boldsymbol{0}\) back to \(\boldsymbol{x}\)
- The matrix is invertible when the determinant does not equal to 0
Inverse of product \(AB\)
Gauss-Jordan elimination
Solve the inverse problem by augmented matrix
Notes:
- \(A\) is symmetric, \(A^{-1}\) will also be symmetric
- The product of pivots of \(A^{-1}\) equals to the determinant of \(A\). Thus, a invertible matrix. cannot have zero deteriminant.
Recognize invertible matrix
Diagonally dominant matrices are invertible, i.e. the diagonal element is larger than the total sum of the rest elements in that row. (3>2)
Elimination=Factorization A=LU
The factors \(\boldsymbol{L}\) and \(\boldsymbol{U}\) are lower & upper trianglar matrix with pivots on its diagonal
We aim to solve:
And this can be decomposed to:
and then
Transpose and permutations
- Tranpose of \(A+B\) is \(A^{T} + B^{T}\)
- \((AB)^{T}=B^{T}A^{T}\)
- \((A^{-1})^{T}=(A^{T})^{-1}\)
- For permutation matrix \(P\), \(P^{-1}=P^{T}\)
Symmetric matrices
Notes:
- The inverse of symmetric is also symmetric
- We can produce a symmetric by \(AA^{T}\) or \(A^{T}A\), A can be any matrix.
\(PA=LU\) Factorization
Use P to deal with row changes
Transpose of a Derivative
Let us say \(A=d/dt\), and define inner product as follows:
We know \((A\boldsymbol{x})^T\boldsymbol{y}=\boldsymbol{x}^T(A^T\boldsymbol{y})\)
Thus, the transpose (adjoint) of derivative is minus the derivatives. (\(A^T=-A\) Antisymmetric)
Chapter3: Vector spaces and subspaces
Vector spaces
The space \(R^n\) consists of all column vectors \(v\) with \(n\) components
Subspaces
A subspace of a vector space is a set of vectors (including \(\mathbf{0}\)) that satisfies: if \(\boldsymbol{v}\) and \(\boldsymbol{w}\) are vectors in the subspace and \(c\) is any scalar, then:
- \(\boldsymbol{v}+\boldsymbol{w}\) is in subspace
- \(c\boldsymbol{v}\) is in the subspace
Notes:
- Every subspace contains the zero vector (Otherwise we need extra dimension)
- Lines through the origin are also subspaces
- A subspace containing \(\boldsymbol{v}\) and \(\boldsymbol{w}\) must contain all linear combinations \(c\boldsymbol{v}+d\boldsymbol{w}\)
Column space
Column space consists of all linear combination of the columns
To solve \(A\boldsymbol{x}=\boldsymbol{b}\) is to express \(\boldsymbol{b}\) as a combination of the columns. That is to say \(A\boldsymbol{x}=\boldsymbol{b}\) is solvable if and only if \(\boldsymbol{b}\) is in the column space of \(A\)
Nullspace of A: Solving Ax=0
The nullspace \(N(A)\) consists of all solutions to \(A\boldsymbol{x}=0\).
Note:
- A -> (m, n), \(\boldsymbol{x}\) -> (n, 1). These vectors \(\boldsymbol{x}\) of the solution would be in \(\mathbb{R}^n\).
- All possible solution vectors \(\boldsymbol{x}\) span the subspace.
How to get the basis of Nullspace?
- Reducing A to its row echelon form R
- Finding special solutions to Ax=0
Here is an example
Here we can find that here the column 3&4 do not have pivot, so they are free column. We can set the correponding \(x_3\) and \(x_4\) to be \(x_3=1, x_4=0\) and \(x_3=0, x_4=1\). Then, we get
Alternatively, we can express C as its reduced row echelon form R (1. Produce zeros above pivots 2. Produce ones in the pivots)
BTW, the only reason to make this is because it is much easier to find the solution. In fact, \(N(A) = N(U) = N(R)\).
One important concept here is that zero nullspace means the columns of \(A\) are independent!
Rank of a matrix
The rank of \(A\) is the number of pivots. This number is \(r\).
NOTE:
- The rank r is the dimension of the column space & row space.
- The greatest thing is that \(n-r\) is the dimension of the nullspace.
Elimination: The big picture
- The column space of A - pivot columns of A as a basis
- The row space of A - nonzero rows of R as a basis
- The nullspace of A - special solutions to Rx = 0 (and Ax=0)
Complete solution to \(Ax=b\)
Use augmented matrix with elimination to solve it
One particular solution Ax=b
For a solution to exist, zero rows in \(R\) must also be zero in \(d\), we can get one particular solution \(x_p\):
Thus, we can get two parts of the solution:
- \(\boldsymbol{x}_{particular}\) that solves \(A\boldsymbol{x}_p=b\)
- \(\boldsymbol{x}_{nullspace}\) (n-r) that solves \(A\boldsymbol{x}_n=0\)
The complete solution would be (one \(x_p\), many \(x_n\). Here the free column is 2nd and 4th):
Every matrix \(A\) with full column rank (r=n) has these properties:
- All columns of \(A\) are pivot column
- No free variables or special solutions
- The nullspace \(N(A)\) contains only zero vector
- If \(A\boldsymbol{x}=\boldsymbol{b}\) has a solution, then it has only one solution
Every matrix \(A\) with full row rank (r=m) has these properties:
- All rows have pivots, and \(R\) has no zero rows
- \(A\boldsymbol{x}=\boldsymbol{b}\) has a solution for every right side b
- The column space is the whole space \(R^m\)
- There are \(n-r=n-m\) special solutions in the nullspace of \(A\)
A brief summary:
Independence, basis and dimension
Basis
Every vector in the space is a unique combination of the basis vectors. The true dimension is measured by counting independent columns.
Linear independence
The sequence of vectors \(\boldsymbol{v}_1, \dots, \boldsymbol{v}_n\) is linearly independent if and only if
happens when all \(x\) are zero.
Vectors that span a subspace
A set of vectors spans a space if their linear combination fill the space.
Note:
-
The columns of a matrix span its column space. They might be dependent.
-
The row space of a matrix is the subspace of \(R^n\) spanned by the rows.
-
The row space of \(A\) is \(C(A^T)\). It is the column space of \(A^T\).
A basis for vector space
The basis vectors are linearly independent and they span the space.
Note:
- The vectors \(v_1, v_2, \dots, v_n\) are basis for \(\mathbb{R}^n\) exactly when they are columns of an n by n invertible matrix.
Four subspaces
For any \(m\times n\) matrix \(A\), there exists four subspaces:
- The row space is \(\boldsymbol{C}(A^T)\), a subspace of \(R^n\)
- The column space is \(\boldsymbol{C}(A)\), a subspace of \(R^m\)
- The nullspace is \(\boldsymbol{N}(A)\), a subspace of \(R^n\) (Reminder: solution of \(A\boldsymbol{x}=0\), \(dim=n-r\))
- The left nullspace is \(\boldsymbol{N}(A^T)\), a subspace of \(R^m\) (Reminder: solution of \(A^T\boldsymbol{x}=0\), \(dim=m-r\))
Chapter4: Orthogonality
Orthogonality of 4 subspaces
Orthogonal vectors:
Orthogonal subpaces: Two subspaces \(V\) and \(W\) of a vector space are orthogonal if every vector \(v\) in \(V\) is perpendicular to every vector \(w\) in \(W\)
E.g. Floor of your room is a subspace \(V\), The line two walls meet is a subspace \(W\). Those subspaces are orthogonal.
E.g. Two walls look perpendicular, those subspaces are not orthogonal. Meeting line is in both \(V\) and \(W\), but it is not perpendicular to itself.
Every vector \(\boldsymbol{x}\) in nullspace is perpendicular to every row of \(A\) because \(A\boldsymbol{x}=\boldsymbol{0}\). The nullspace \(N(A)\) and row space \(C(A^T)\) are orthogonal subspaces of \(R^n\).
Proof: The vectors in row space are combinations \(A^\top \boldsymbol{y}\) of the rows. Take the dot product of \(A^\top \boldsymbol{y}\) with any \(x\) in nullspace:
Every vector \(\boldsymbol{y}\) in nullspace of \(A^\top\) is perpendicular to column row of \(A\). The left nullspace \(N(A^\top)\) and row space \(C(A)\) are orthogonal in \(R^m\).
Orthogonal complements
Orthogonal complement of a subspace \(V\) contains every vector that is perpendicular to \(V\). This orthogonal subspace is denoted by \(V^{\perp}\) (V perp).
Note:
- \(N(A)\) is the orthogonal complement of the row space \(C(A^T)\) (in \(R^n\)).
- \(N(A^T)\) is the orthogonal complement of the column space \(C(A)\) in (\(R^m\))
As \(A\boldsymbol{x} = A\boldsymbol{x}_r + A\boldsymbol{x}_n=A\boldsymbol{x}_r\), we can find that the row space component goes to the column space:
Projections
Projection onto a line
We want to find a point \(\boldsymbol{p}\) that is closest to \(\boldsymbol{b}\) on \(\boldsymbol{a}\). The key to projection is orthogonality: The line from \(\boldsymbol{b}\) to \(\boldsymbol{p}\) is perpendicular to the vector \(\boldsymbol{a}\). The projection \(\boldsymbol{p}\) would be some multiple of \(\boldsymbol{a}\), i.e. \(\boldsymbol{p}=\hat{x} \boldsymbol{a}\). To solve the projection, we need 3 steps:
- Find \(\hat{x}\)
- Find vector \(\boldsymbol{p}\)
- Find matrix \(P\)
Here projecting \(\boldsymbol{b}\) onto \(\boldsymbol{a}\) will have an error \(\boldsymbol{e}= \boldsymbol{b} - \hat{x} \boldsymbol{a}\), so we have
Thus, the projection of \(\boldsymbol{b}\) onto the line through \(\boldsymbol{a}\) is the vector \(\boldsymbol{p}=\hat{x}\boldsymbol{a}=\frac{\boldsymbol{a}^T\boldsymbol{b}}{\boldsymbol{a}^T\boldsymbol{a}}\boldsymbol{a}=P\boldsymbol{b}\). The projection matrix \(P=\frac{\boldsymbol{a}\boldsymbol{a}^T}{\boldsymbol{a}^T\boldsymbol{a}}\).
Projection onto a subspace
We want to find the combination \(\boldsymbol{p} = \hat{x}_1\boldsymbol{a}_1+\dots \hat{x}_n\boldsymbol{a}_n\) closest to a given vector \(\boldsymbol{b}\). That is to say, we are projecting each \(\boldsymbol{b}\) in \(\mathbb{R}^m\) onto the subspace spanned by \(\boldsymbol{a}\)'s, i.e., column space of matrix \(A\).
-
Find \(\hat{\boldsymbol{x}}\)
\[ A^T(\boldsymbol{b} - A\hat{\boldsymbol{x}}) \quad\text{or} A^T A\hat{\boldsymbol{x}}=A^T\boldsymbol{b} \] -
Find \(\boldsymbol{p}\)
\[ \boldsymbol{p} = A\hat{\boldsymbol{x}} = A(A^T A)^{-1} A^T \boldsymbol{b} \] -
Find \(P\)
\[ P = A(A^T A)^{-1} A^T \]
\(A^TA\) is invertible if and only if \(A\) as linearly independent columns
When \(A\) has independent columns, \(A^TA\) is square, symmetric, and invertible.
Least squares approximations
Oftentimes, \(A\boldsymbol{x}=\boldsymbol{b}\) has no solution. The usual reason is too many equations (m>n). That is to say \(\boldsymbol{b}\) is outside that column space of \(A\), or \(A\boldsymbol{x}\) is in the subspace of \(\boldsymbol{b}\). Thus, we need project \(\boldsymbol{b}\) onto that subspace to minimize the error.
Orthonormal bases and Gram-Schmidt
The vectors \(\boldsymbol{q}_1, \dots \boldsymbol{q}_n\) are orthonormal if
Orthogonal matrix \(Q\): When \(Q\) is square, \(Q^TQ=I\) means that \(Q^T=Q^{-1}\).
The orthogonal matrix \(Q\) leaves the length unchanged, i.e., \(\|Q\boldsymbol{x}\|=\|\boldsymbol{x}\|\)
Gram-Schmidt process
Start with three independent vectors \(\boldsymbol{a},\boldsymbol{b},\boldsymbol{c}\) create three orthogonal vectors \(A,B,C\).
- Choosing \(A=\boldsymbol{a}\).
- Start with b and subtract its projection along \(A\)
- Then c
Summary: Substract from every new vector its projection in the directions already set.
QR factorization
The Gram-Schmidt process will create an orthogonal matrix \(Q\), and we can create a matrix \(R\) to make \(A=QR\). Let us start with a matrix\(A\) whose columns are \(\boldsymbol{a}, \boldsymbol{b}, \boldsymbol{c}\).
We need firstly create \(\boldsymbol{q}_1=\boldsymbol{a}/\|\boldsymbol{q}\|\). Then, construct the rest vectors using Gram-schmidt process:
And we can also get \(R=Q^TA\)
By using QR factorization, we can simplify the least square equation \(A^TA\hat{x}=A^T b\):
Chapter 5: Determinants
Purpose of determinants
Determinant is used to solve linear equation. Consider the following case:
Using Gaussian elimination, we could get:
We can rewrite the solution with determinant:
Consider the following third-order determinants:
We can find that for subscript, the first term is 1, 2, 3 while the second term is the combination of 1, 2, 3. The sign is determined by inversion number. Consider following case:
The inversion number is the sum of all \(t_i\). In this case, the inversion number \(t=6\). The general rule for calculating n-order determinant is as follows:
where \(t\) is the inversion number of \(a_{1 p_1}, a_{2 p_2}, \cdots, a_{n p_n}\).
Again, remember that determinant is used to solve linear equations. Consider the following cases:
This is called Cramer's rule. Consider the case \(A\boldsymbol{x}=\boldsymbol{b}\):
Then, we could solve the equation:
where
Properties of determinants
- Singular matrix has zero determinant
- The product of the pivots is the determinant
- The determinant changes sign when two rows (or two columns) are exchanged
- The determinant is a linear function of each row separately (First row multiply by t, determinant is also multiplied by t. First row is added, determinant is also added)
- Two rows of A are equal, then \(detA=0\)
- Substract a multiple of one row from another row leaves \(det A\) unchanged
- A matrix with a row of zeros has \(detA=0\)
- The determinant of \(AB\) is \(detA\) times \(detB\)
Cramer's rule & Volume
This is for solving \(A\boldsymbol{x}=\boldsymbol{b}\)
Step1:
Using product rule: \((detA)(x_1)=det B_1\)
Step2:
Using product rule: \((det A)(x_2)=det B_2\)
A brief summary: Use product to solve \(A\boldsymbol{x}=\boldsymbol{b}\), The matrix \(B_j\) has the \(j^{th}\) column of \(A\) replaced by vector \(\boldsymbol{b}\).
Area of a triangle
The area of a triangle is half of a 3 by 3 determinant
Cross product
Properties:
- \(\boldsymbol{v} \times \boldsymbol{u}\) reverses rows 2 and 3 in the determinant so it equals \(-(\boldsymbol{u}\times \boldsymbol{v})\)
- The cross product \(\boldsymbol{u} \times \boldsymbol{v}\) is perpendicular to \(\boldsymbol{u}\) and also \(\boldsymbol{v}\)
- The cross product of any vector with itself is \(\boldsymbol{u} \times \boldsymbol{u}=0\)
- The length of \(\boldsymbol{u}\times \boldsymbol{v}\) equals the area of the parallelogram with sides \(\boldsymbol{u}\) and \(\boldsymbol{v}\). i.e. \(\|u\| \|v\| |\sin\theta|\)
Triple product = Determinant = Volume
Triple product:
Chapter 6: Eigenvalues and Eigenvectors
Introduction to Eigenvalues
The basic equation for eigenvalues and eigenvectors is:
The eigenvectors make up the nullspace of \(A-\lambda I\):
Geometric interpretation
For
we know its eigenvalues are \(\lambda= 1\) and \(\lambda= 1/2\) and the eigenvectors are \((0.6, 0.4)^T\) and \((0.5, -0.5)^T\) respectively. We can find that
Determinant and trace
- The product of n eigenvalues equals the determinant
- The sum of n eigenvalues equals the sum of the n diagonal entries (trace)
Imaginary eigenvalues
Consider a rotation matrix:
Its eigenvalues are \(\lambda=\pm i\). (Think about it, no real vectors \(\mathbf{x}\) stays in the original direction as it is a rotation matrix)
Diagonalizing a matrix
Suppose the n by n matrix \(A\) has n linearly independent eigenvectors \(\mathbf{x}_1, \dots, \mathbf{x}_n\). Put them into the columns of an eigenvector matrix \(X\):
Then,
That is to say \(A^2\) has the same eigenvectors in \(X\) and squared eigenvalues in \(\Lambda^{2}\).
Note:
- Suppose eigenvalues \(\lambda_1, \dots, \lambda_n\) are all different, then it is automatic that the eigenvectors \(\mathbf{x}_1, \dots, \mathbf{x}_n\) are independent. Any matrix that has no repeated eigenvalues can be diagonalized
Similar matrices: Same Eigenvalues
All matrices \(A = BCB^{-1}\) that allows all invertible \(B\) are similar as they all share eigenvalues of \(C\)
Matrix Powers \(A^k\)
Consider a typical difference equation \(\mathbf{u}_{k+1}=Au_{k}\), the solution:
If we represent \(\mathbf{u}_0\) with eigenvectors (\(\mathbf{u}_0=c_1 \mathbf{x}_1 + \dots + c_n \mathbf{x}_n\)), the solution can be written as:
System of differential equations
The ordinary differential equation \(\frac{du}{dt}=u\) and \(\frac{du}{dt}=\lambda u\) are solved by exponentials:
The value of \(C\) is solved by initial value \(u(0)=C\).
Let us consider a system of n equations:
starting from the vector \(\mathbf{u}(0) = [u_1(0),\dots, u_n(0)]^T\) at \(t=0\)
The solution would be in the form of \(e^{\lambda t}\mathbf{x}\), where \(\lambda\) and \(\mathbf{x}\) are eigenvalues and eigenvectors of \(A\). Substitute \(\mathbf{u}(t) = e^{\lambda t}\mathbf{x}\) into the original equation:
We can obtain \(A\mathbf{x}=\lambda\mathbf{x}\). This proves we are right.
Consider an example:
The matrix \(A\) has eigenvalues \(\lambda=\pm 1\) and eigenvectors \((1, 1)\) and \((1, -1)\) respectively. The solution would be:
The complete solution is:
Using the initial value \(\mathbf{u}(0)\) yield \(C=3\) and \(D=1\).
Second order equations
Consider \(my''+by' + ky = 0\), the solution is to substitute \(y=e^{\lambda t}\):
We can find two roots of \(m\lambda^2 + b\lambda + k=0\) and solve the equation.
Let us suppose \(m=1\), we could write follow linear equation:
Then, we could solve this equation:
Stability of 2 by 2 matrices
When solution of \(du/dt=A\mathbf{u}\) approach 0 as \(t\rightarrow \infty\), we say this problem is stable. Since the solution of \(\mathbf{u}(t)\) built from \(e^{\lambda y}\mathbf{x}\), we know that to make it become stable, \(\lambda\) must be negative.
The question is: Which matrices have negative eigenvalues (real parts)?
\(A\) is stable and \(\mathbf{u}\rightarrow \mathbf{0}\) when all \(\lambda\) have negative real parts. Then, \(A=\begin{bmatrix}a &b\\ c&d\end{bmatrix}\) must pass two tests:
- \(\lambda_1 + \lambda_2 < 0 \Rightarrow trace = a + d <0\)
- \(\lambda_1\lambda_2 < 0 \Rightarrow determinant = ad -bc >0\)
The exponential of a Matrix
We want solution \(\mathbf{u}(t)\) in a new form \(e^{At}\mathbf{u}(0)\). Now what does \(e^{At}\) means ?
We know \(e^x = 1 + x + \frac{1}{2}x^2 + \frac{1}{6}x^3+\dots\), when we change \(x\) to \(At\), this series defines the matrix exponential \(e^{At}\):
And its derivative \(Ae^{At}\):
And its eigenvalue \(e^{\lambda t}\):
Find \(u(t)=e^{At}\mathbf{u}(0)\) by diagonalization
Assume A has n independent eigenvectors, so it is diagonalizable. Substitute \(A=X\Lambda X^{-1}\) into the series for \(e^{At}\):
Then, we get
We can obtain the solution using three steps (Similar to previous one):
- \(\mathbf{u}(0)= c_1\mathbf{x}_1 + \dots + c_n\mathbf{x}_n = X\mathbf{c}\)
- Multiply each \(\mathbf{x}_i\) by its growth factor \(e^{\lambda_i t}\)
- We obtain \(\mathbf{u}(t)= c_1 e^{\lambda_1 t}\mathbf{x}_1 + \dots + c_n e^{\lambda_n t}\mathbf{x}_n\)
Symmetric matrices
For symmetric matrices, we know
That means \(X^T X= I\), i.e. each eigenvector in \(X\) is orthogonal to othher eigenvectors when \(S=S^T\). Here are key facts:
- A symmetric matrix has only real eigenvalues
- The eigenvectors can be chosen orthonormal, then the eigenvector matrix \(X\) becomes an orthogonal matrix \(Q\)
Complex Eigenvalues of Real Matrices
For real matrices, complex \(\lambda\)'s and \(x\)'s come in "conjugate pairs":
For instance, \(A = \begin{bmatrix}\cos\theta & -\sin\theta\\ \sin\theta & \cos\theta\end{bmatrix}\) has \(\lambda_1 = \cos\theta + i\sin\theta\) and \(\lambda_2 = \cos\theta - i\sin\theta\). The eigenvalues are conjugate to each other and eigenvectors are \((1, i)^T\) and \((1, -i)^T\).
NOTE: For this rotation matrix (or every orthogonal matrix \(Q\)), we have \(|\lambda|=1\)
Eigenvalues versus pivots
- product of pivots = determinant = product of eigenvalues
- For symmetric matrices, #positive eigenvalues = #positive pivots
Positive definite matrices
For Symmetric matrix that have all positive eigenvalues, we call it positive definite matrices.
We can conduct an easy test by \(a>0\) and \(ac-b^2>0\). Note that positive eigenvalues mean positivve pivots and vice versa.
Energy-based definition
From \(S\mathbf{x}=\lambda\mathbf{x}\), we can get
Thus, we know that \(\mathbf{x}^T S\mathbf{x}\) is positive for all nonzero vectors \(\mathbf{x}\) (\(\mathbf{x}^T S\mathbf{x}\) is usually called energy in many applications).
This definition leads to if \(S\) and \(T\) are symmetric positive definite, so is \(S+T\).
If the columns of \(A\) are independent, then \(S=A^T A\) is positive definite:
Now we have a conclusion when a symmetric matrix \(S\) has one of these five properties, it has them all:
- All n pivots of S are positive
- All n upper left determinants are positive
- All n eigenvalues of S are positive
- \(\mathbf{x}^T S\mathbf{x}\) is positive except at \(\mathbf{x}=0\). This is energy-based definition
- S equals \(A^TA\) for a matrix \(A\) with independent columns
Positive semi-definite matrices
When the determinant is zero and the smallest eigenvalue is zero, the energy in its eigenvector is:
These matries are called positive semi-definite and can be factors into \(A^TA\) with dependent columns in \(A\).
Positive semi-deinite matrices have all \(\lambda \geq 0\) and all \(\lambda \mathbf{x}^T S \mathbf{x} 0\).
The ellipse \(ax^2 + 2bxy + cy^2=1\)
Consider this tilted ellipse \(5x^2 + 8xy + 5y^2=1\), we can rewrite it into the form of \(\mathbf{x}^TS\mathbf{x}\):
We can factor \(S=Q\Lambda Q^T\):
We can also rewrite: \(\mathbf{x}^TS\mathbf{x} =(\mathbf{x}^T Q )\Lambda (Q^T\mathbf{x})=X^T\Lambda X\).
The figure of \(\mathbf{x}^TS\mathbf{x}\) and \(X^T\Lambda X\) are as follows:
The axes of the tilted ellipse (ellipse) point along eigenvectors of \(S\) (\(\Lambda\)). This explains why \(S=Q\Lambda Q^T\) called Principal axis theorem as it displays the axes: axis direction (from eigenvectors) + axis lengths (from eigenvalues)
Important application: test for a minimum
F(x, y) has a minimum if \(\partial F / \partial x = \partial F / \partial y=0\) and \(S=\begin{bmatrix}\partial^2 F / \partial x^2 & \partial^2 F / \partial x \partial y \\\partial^2 F / \partial y \partial x & \partial^2 F / \partial y^2 \end{bmatrix}\) (Hessian) is positive definite.
Chapter 7: Singular value decomposition (SVD)
Eigenvectors for SVD
The singular value decomposition theorem for \(A\) is eigenvalue theorem for \(A^TA\) and \(AA^T\). For matrix \(A\in \mathbb{R}^{m\times n}\), it has two sets of singular vectors (the eigenvectors of \(A^TA\) and \(AA^T\)) and one set of positive singular values (because \(A^TA\) has the same positive eigenvalues as \(AA^T\) as they are symmetric positive semi-definite)
Motivation: Find orthogonal basis for image \(A\)
Difficulty: Eigenvectors of most images are not orthogonal, and we want two set of basis
Solution: Use the eigenvectors \(\boldsymbol{u}\in \mathbb{R}^m\) of \(AA^T\) and the eigenvectors \(\boldsymbol{v}\in \mathbb{R}^n\) of \(A^TA\). \(\boldsymbol{u}\) is called left singular vector, \(\boldsymbol{v}\) is called right singular vector, \(\sigma\) are singluar values. (square root of eigenvalues of \(AA^T\) and \(A^TA\)).
Bases and matrices in the SVD
We have two sets of singular vectors \(\boldsymbol{u} \in R^m\) and \(\boldsymbol{v}\in R^n\). They give bases for four subspaces:
- \(\boldsymbol{u}_1,\dots,\boldsymbol{u}_r\) is an orthonormal basis for the column space
- \(\boldsymbol{u}_{r+1},\dots,\boldsymbol{u}_{m}\) is an orthonormal basis for left nullspace \(N(A^T)\)
- \(\boldsymbol{v}_1,\dots,\boldsymbol{v}_r\) is an orthonormal basis for the row space
- \(\boldsymbol{v}_{r+1},\dots,\boldsymbol{v}_{m}\) is an orthonormal basis for nullspace \(N(A)\)
We can re-write the formula of SVD using matrices \(AV=U\Sigma\):
Given \(V^{-1} = V^T\), \(AV=U\Sigma\) becomes \(A=U\Sigma V^T\):
Proof of the SVD
On the right, \(V\) is the eigenvector matrix for \(A^TA\) and \(\Sigma^T\Sigma\) must be eigenvalue matrix of \(A^TA\), so each \(\sigma^2\) is \(\lambda(A^TA)\).
Singular vectors of \(A\) and eigenvectors of \(S=A^TA\)
Start with the largest \(\lambda_1\), we are actually solving this problem (Recall \(S\mathbf{x} = \lambda\mathbf{x}\)):
The wining vector is \(\mathbf{x} = \mathbf{q}_1\) with \(S\mathbf{q}_1 = \lambda_1\mathbf{q}_1\). The largest singlue value \(\sigma_1\) of \(A\) is solving:
The wining vector is \(\mathbf{x} = \mathbf{v}_1\) with \(A\mathbf{v}_1 = \sigma_1\mathbf{u}_1\).
Principal Component Analysis (PCA by the SVD)
Let us start with a matrix \(A_0\in \mathbb{R}^{m\times n}\). We first substract the average of each row to reach \(A\). Let us assume m=2 and visualize those points:
We can find these re-centered point cluster along a line, but how to find that line?
We need firstly find the sample covariance matrix defined by:
where
- \(A\) shows the distance \(a_{ij}-\mu_i\) from each measurement to row average
- \((AA^T)_{11}\) and \((AA^T)_{22}\) shows the sum of squared distances (sample variance \(s_1^2\), \(s_2^2\))
- \((AA^T)_{12}\) shows the sample covariance \(s_{12}= (\text{row 1 of A})\cdot(\text{row 2 of A})\)
Note:
- High covariance means: Strong relationship
- Why divide by \(n-1\) instead of \(n\)? - one degree of freedom has been used by the mean, leaving \(n-1\)... just agree with it...
Consider this matrix \(A\):
The eigenvalue of \(S\) are near 57 and 3. The first rank one piece \(\sqrt{57}\mathbf{u}_1\mathbf{v}_1^T\) is much larger than second piece \(\sqrt{3}\mathbf{u}_2\mathbf{v}_2^T\). The leading eigenvector \(\mathbf{u}_1\approx (0.6, 0.8)\) shows the direction in the scatter graph above.
The essentials of Principal Component Analysis (PCA)
-
The total variance \(T\) is the sum of all eigenvalues and of sample variances \(s^2\)
\[ T = \sigma_1^2 + \dots + \sigma_m^2 = s_1^2 + \dots + s_m^2 = \mathrm{trace}(S) \] -
We can get the largest \(R\) singular values and corresponding directions. The n data points will be near an \(R\)-dimenisonal subspace with basis \(\mathbf{u}_{1}\) to \(\mathbf{u}_{r}\). These \(\mathbf{u}\)'s are the principal components in \(m\)-dimensional space.
Perpendicular least squares
The best line \(\mathbf{u}_1\) in above also solves the problem of perpendicular least squares:
The sum of squared distances from the point to the line is a minimum.
Proof: Separate each column \(\mathbf{a}_j\) into its components along the \(\mathbf{u}_1\) line and \(\mathbf{u}_2\) (Project onto \(\mathbf{u}_1\) and \(\mathbf{u}_2\)):
The sum on the left is fixed by the data points \(\mathbf{a}_j\) (columns of \(A\)). The first sum on the right is \(\mathbf{u}_1^T A A^T\mathbf{u}_1\). So when we maximize that sum by choosing eigenvector \(\mathbf{u}_1\), we minimize the second sum, the squared distance from the data points to the best line (Because of the Orthonormal basis). That second sum will be the minimum for perpendicular least square!!!
The Geometry of the SVD
SVD separates a matrix into \((\mathrm{orthogonal})\times (\mathrm{diagonal}) \times (\mathrm{orthogonal})\) - \((\mathrm{rotation})\times (\mathrm{stretching}) \times (\mathrm{rotation})\)
The norm of a matrix
The largest singular value \(\sigma_1\) is the norm of the matrix \(A\):
The norm \(\|A\|\) is the largest ratio
Polar decomposition \(A=QS\)
Every real square matrix can be factored into \(A=QS\), where \(Q\) is orthogonal and \(S\) is symmetric positivve semi-definite. If \(A\) is invertible, \(S\) is positive definite.
Polar decomposition separate the rotation (in \(Q\)) from the stretching (in \(S\)). NOTE: \(Q=UV^T\) is the nearest orthogonal matrix to \(A\). This \(Q\) makes the norm \(\|Q-A\|\) as small as possible. This corresponds to the fact that \(e^{i\theta}\) is the nearest number on the unit circle to \(re^{i\theta}\).
Pseudoinverse \(A^\dagger\)
Least Square with Dependent Columns
\(A^T A\) might be singluar so \(A^TA\mathbf{x}=A^T\mathbf{b}\) might have infinitely many solutions. Pseudoinverse gives us a way to choose a beter solution \(\mathbf{x}^\dagger = A^\dagger\mathbf{b}\).
Reference
Gilbert Strang: Introduction to linear algebra. Wellesley-Cambridge Press, 2022.