All The Linear Algebra You Need For Grad School
Themistoklis Haris, August 25, 2023
Introduction
Knowing good Linear Algebra is almost essential to someone serious about research in mathematical sciences. This is simply because of the universality of the ideas and tools Linear Algebra provides. There are so many linear transformations everywhere around us, from taking derivatives to Fourier Transforms. Having a solid Linear Algebra foundation allows one to approach these topics with the mathematical maturity that allows them to connect dots between seemingly completely different subjects. I think that maturity is a very important skill to have as a researcher.
For this blog, I assume the reader already has prior exposure to linear algebra and they are comfortable with the concept of a vector space at the very least. The purpose of this exposition is to give a high level picture of Linear Algebra, with a particular focus on Data Science, so low-level rigorous proofs will often be omitted with proof sketches taking their place.
The contents of this blog post are the result of studying Chapter 1 of the book Big Data and Learning From Data, by Gilbert Strang.
Vector Spaces, Linear Transformations and Matrices
Linear Algebra at its core is the study of linear transformations. These are transformations map sums to sums and scalar products to scalar products:
The objects these transformations handle are called vectors. Sets of vectors equipped with the addition and scalar multiplication operations are called vector spaces. Vector spaces within vector spaces are called subspaces. Vector spaces appear everywhere in the mathematical sciences - in quantum mechanics the set of all admissable wavefunctions is a vector space, and so is the set of all 256 x 256 images on a computer. Amazingly, the set of rules for disecting and analyzing vector spaces that Linear Algebra studies are applied on every single vector space, from physics to computer graphics. It is astonishing to realize that the same principles govern the structure of such different objects.
While a more traditional treatment of Linear Algebra starts with the properties of vector spaces and linear transformations and eventually considers bases and dimensions, we will instead jump directly to the main protagonist of the show: matrices. Matrices are objects through which we can study any linear transformation between any two finite dimensional vector spaces. The connection comes the mapping through a linear transformation of one vector space's basis to another basis. That mapping is what a matrix represents. Indeed, this is one of the first core concepts one should grasp in linear algebra:
Matrix Transformations
How do matrices transform vectors? Via matrix multiplication! is the vector in which matrix transforms vector to. The typical example to think of is rotations in a plane via a 2 x 2 matrix. The way everyone learns to multiply matrices is via row-column "dot-product" multiplication, but there is another way. A way that allows us to think of matrix multiplication in terms of linear combinations of columns.
can be computed via linear combination of the columns of . Consider this example:
This leads us to the concept of the column space of . It is the set of all linear combinations of columns of a matrix. Similarly we can define the row space of a matrix as the column space of its transpose: . We should of course introduce the concept of linear independence between a set of vectors : when none can be written as a linear combination of any others. More simply put,
A basis of a vector space is a set of linearly independent vectors that can produce any vector in the space via linear combination. The number of elements in a basis is called the dimension of the vector space. We shall work with finite dimensional vector spaces in this document. We can prove that even though multiple different bases may exist for a matrix, their cardinality (aka the dimension) remains the same.
The rank of a matrix is just the dimension of its column space. The rank counts independent columsn. It is an important fact in linear algebra that the number of independent rows is equal to the number of independent columns:
Matrix Multiplication
Multiplying two matrices can also be thought of in terms of linear combinations, but this time of rank-1, column x row matrices. When we multiply a row with a column, we get a matrix of rank 1, meaning that all its rows are multiple of one row, and the same is true for its columns:
To multiply two arbitrary matrices, we just add pieces of rank-1. We view the first array as a row of columns and the second as a column of rows:
where are columns and are rows. For example:
But why does this way of multiplying matrices work? Let's see the short proof:
Now, let's understand why column rank equals row rank by putting this new way of multiplying matrices to practice. Suppose we are given a matrix . We construct a matrix by iteratively taking columns of and rejecting ones dependent on previously taken columns. This gives us a matrix but also another matrix that describes how the columns of must be linearly combined to give the columns of . This allows us to see that the number of independent columns equals the number of independent rows and to also arrive at our first major factorization:
As an example:
To test our understanding of column spaces and matrix multiplication, let's explore the following important facts:
- Multiplying two matrices does not increase rank
The first fact comes from . Why is that? Look at matrix multiplication! Every column of is a linear combination of columns of and every row is a linear combination of rows of . The second fact comes from combining bases for and . Columns of are sums of respective columns of and
Four Fundamental Subspaces
There are 4 subspaces that we need to know because their relationships and significance will play a huge role in the theorems to prove in the future.
- The column space : linear combinations of columns
- The row space : linear combinations of rows
- The nullspace : solutions to
- The left nullspace : solutions to
Five Fundamental Matrix Factorizations
One major common theme in Linear Algebra is to factorize a matrix and then examine the rank-1 pieces of that factorization. That analysis allows us to inspect which parts of a matrix have a larger or a lesser influence on its properties. To that end, Linear Algebra seeks to factorize matrices in as many and as easy ways as possible. Sometimes we can only factorize a class of matrices in a certain way, but the goal is definitely to be as broad as possible. Below follows a list of 5 major factorizations in linear algebra. We will examine each one of them in detail in this post.
- Elimination. is upper triangular, is lower triangular
- Orthogonalization of columns of
- Spectral decomposition. is symmetric.
- Diagonalization
- Singular Value Decomposition (SVD)
Elimination and
The method of elimination or "pivots" is a well-known method for solving linear systems of equations. We subtract multiples of a row from the other rows to eliminate variables. At the end, we use back substitution to find the solutions to our equations. The method essentially tries to convert a matrix to an upper triangular matrix with the pivots in the diagonal:
The numbers we used to create form a lower-triangular matrix . In the -th column of we put all the numbers we used to eliminate rows and below:
Every time we remove a row from the rows below it, we essentially remove a rank-1 matrix from . The first matrix we remove is (multiples of row 1). Then we remove (multiples of row 2). In the end, we get
The method of elimination gives us the factorization. This factorization can be used to solve :
Of course, the matrix will not always be invertible, so the equation might have no solutions or have infinite solutions, but the main idea is there. One particular example of a pesky obstacle in solving these systems in general is the existence of zero-pivots. For example, if we have then we cannot define the first pivot. What we do is permute the rows in such a way that we can have . We can permute the rows of a matrix by using a permutation matrix . In this case we can always write :
Orthogonality
Orthogonal vectors play a huge role in linear algebra. This is because orthogonal vectors are independent, so we can talk about orthogonal bases. Orthogonality comes from defining another operation between vectors: the inner product: . We'll use a standard inner product definition for this section: . For complex valued vectors we use the conjugate of , but for simplicity we will assume real components now.
Two vectors are orthogonal if . For a vector we define its L2-norm as . An example is the Pythagorean theorem:
More general, we have the law of cosines:
An orthogonal basis is a basis for which every pair of vectors are orthogonal. Let be such a basis. Then, for any vector , we have that . This is equivalent to projecting to the basis. We can get these projection coefficients simply by taking the inner product of with each basis vector:
Why is that? Just examine: .
A pair of subspaces is called orthogonal if for every pair of vectors we have . By using the column-row multiplication of , it is easy to see that:
We can also consider orthogonal matrices. A matrix is called orthogonal if its columns are orthonormal, meaning that they are orthogonal to each other and each has norm 1. In other words,
You can think of these matrices as rotations in the plane. An important property of these matrices is that they don't change the lenghts of the vectors they transform:
If a square matrix is orthogonal, its transpose is its inverse!
Also, if are orthogonal, then so is . We'll leave this proof to the reader.
Eigenvalues and Eigenvectors
The eigenvectors of a matrix don't change direction when multiplied by it:
These vectors are, as we will find out, very important for analyzing matrices because they are an invariant to the linear transformation of the matrix. Let's first explore some of their properties (without proof, but reader is encouraged to seek them out on her own):
- If , then
- If are independent eigenvectors, then they form a basis of the column space of , so
- The trace of a matrix (the sum of its diagonal values) is equal to the sum of its eigenvalues
- The determinant of a matrix is equal to the product of its eigenvalues.
- If is symmetric, then its eigenvalues are real. We'll see why soon.
- If , then
Let's examine how to calculate the eigenvalues of a matrix. It boils down to computing a determinant!
Two matrices are similar when they have the same eigenvalues. If is invertible, then and are similar:
We can use similar matrices to compute the eigenvalues of a matrix. The idea is to transform the original matrix into a similar diagonal matrix where the eigenvalues end up on the diagonal. This process is known as diagonalization. After we diagonalize a matrix to obtain and also a set of eigenvectors , we can then write:
This gives us another factorization known as eigendecomposition:
It is important to note at this point that not all matrices are diagonalizable! If there are repeated eigenvalues specifically, we might run into particularly tricky situations, but that is beyond the scope of this document.
Symmetric Positive Definite Matrices
There are some matrices that are more convenient to work with than others. The crown goes to symmetric matrices, and specifically to symmetric matrices with positive eigenvalues. Those matrices play a very important role in science and appear almost everywhere, so we need to know their basic properties:
- All the eigenvalues of a symmetric matrix are real numbers.
- All eigenvectors of a symmetric matrix can be chosen to be orthonormal to each other!
Now, if we put all the orthonormal eigenvectors in a matrix , then we must have , and this in fact gives us the powerful spectral decomposition theorem:
which is simply eigendecomposition for symmetric matrices. The power lies in that we don't have to compute the inverse of . We just take the transpose.
Let's talk a little bit about proving / justifying the two statements above. To explain why all eigenvalues are real, examine the equation of a single eigenvalue. Multiply both sides to get . The quantity on the left is always real (as can be seen with some algebra), and the dot-product on the right is also real, forcing to be real.
As for why the eigenvectors can be chosen orthogonal to each other, still take for some vector , but now also consider a different eigenvector for which , with . It is easy to see that belongs to the nullspace of . But belongs to the column space of : . And we know that the nullspace and column spaces are orthogonal, so we know that the vectors are orthogonal. We need to handle the case where , but we'll leave that for the reader.
Now to talk about the star of the show: symmetric positive definite matrices. These are matrices whose eigenvalues are all positive. We write . There are multiple criteria to test whether a matrix is PD:
- All the eigenvalues are positive.
- For all vectors , the quantity is positive.
- with having independent columns and vice-versa!
- All leading determinants are positive. These are the determinants of the matrices that result if we delete the last rows and columns. (Sylvester's criterion)
- All pivots of are positive.
Let's explore some of the reasoning behind the facts above. The energy or quadratic form expression is going to be a central theme that connects all the criteria. The reason why it works is because it works easily on eigenvectors (since the eigenvalues are positive) and the eigenvectors form a basis for . The expression works because of the energy calculation:
The last inequality is strict because the columns of are independent. As for the pivot criterion, it helps connect the decomposition with the spectral decomposition into something known as the Cholesky decomposition. Briefly, if we write
by pulling the pivots into a matrix then we can partition and write
We should note that PD matrices and spectral decomposition play a huge role in optimization. For example, we can check if a function is convex by checking if its Hessian matrix is PD.
Singular Value Decomposition
The SVD is a crowning jewel in Linear Algebra. The reason is its versatility. It works with every matrix, unlike the other factorizations. We built a beautiful theory on symmetric matrices, and we want to use it broadly. So, how to create a symmetric matrix from any matrix ?
The idea is to use and . Since those matrices are symmetric, we can pick a set of orthonormal eigenvectors from , where is the rank of . Let those be . Also let be the corresponding eigenvalues, and let .
Now let . These are orthonormal eigenvectors of (small calculation) which are chosen carefully to satify the "eigen"-equation:
In matrix form, we can write:
where and . Importantly, we can do what we always do when factorizing a matrix: break it into rank-1 pieces. If we order the singular values in decreasing order, the first terms of the summation give us the best possible -rank approximation of . This is very valuable in Data Science and is known as the Eckart-Young Theorem, which is the foundation of PCA (Principal Component Analysis)
Finally, it is worth noting that the SVD has a beautiful geometric interpretation that I would highly suggest looking up online for yourself. It basically tells us that every linear transformation can be decomposed into two rotations and one stretching! That, I believe, is amazing to see.
This concludes our quick tour across the Linear Algebra universe. There are many more things to explore, but they will be left up to necessity. PCA, Jordan Canonical Forms, Dual spaces are some of the ideas to look at next. I hope this has been helpful!