6.694395 secondsCours ENPC — Pratique du calcul scientifique
$$
% % %
%
$$
\[ \definecolor{gris_C}{RGB}{96,96,96} \definecolor{blanc_C}{RGB}{255,255,255} \definecolor{pistache_C}{RGB}{176,204,78} \definecolor{pistache2_C}{RGB}{150,171,91} \definecolor{jaune_C}{RGB}{253,235,125} \definecolor{jaune2_C}{RGB}{247,208,92} \definecolor{orange_C}{RGB}{244,157,84} \definecolor{orange2_C}{RGB}{239,119,87} \definecolor{bleu_C}{RGB}{126,151,206} \definecolor{bleu2_C}{RGB}{90,113,180} \definecolor{vert_C}{RGB}{96,180,103} \definecolor{vert2_C}{RGB}{68,141,96} \definecolor{pistache_light_C}{RGB}{235,241,212} \definecolor{jaune_light_C}{RGB}{254,250,224} \definecolor{orange_light_C}{RGB}{251,230,214} \definecolor{bleu_light_C}{RGB}{222,229,241} \definecolor{vert_light_C}{RGB}{216,236,218} \]
Goal of this chapter
Study numerical methods to find the eigenpairs \((\mathbf{\boldsymbol{v}}_i, \mathbf{\boldsymbol{\lambda}}_i)\) satisfying \[ \mathsf A \mathbf{\boldsymbol{v}} = \lambda \mathbf{\boldsymbol{v}}, \] where \(\mathsf A \in \mathbf C^{n \times n}\) and \(b \in \mathbf C^n.\)
\(~\)
Simple vector iterations, for calculating just one eigenpair;
Subspace iterations: generalization to calculate multiple eigenpairs at the same time;
Projection methods: approximate large eigenvalue problem by a smaller one.
Remark: can we calculate the eigenvalues as roots of the characteristic polynomial?
Cost of calculating the characteristic polynomial scales as \({\color{red}n!}\)
\({\color{green} \rightarrow}\) not a viable approach…
Before we begin:
We present a motivating example;
We discuss sparse matrix formats
Consider a random walk on the graph:
Assumption: jumps to adjacent nodes equi-probable.
The associated adjacency matrix \(\mathsf A \in \{0, 1\}^{n \times n}\) is given by \[ a_{ij} = \begin{cases} 1 & \text{if there is an edge between $i$ and $j$,} \\ 0 & \text{otherwise.} \end{cases} \]
\(~\)
Probability of transition \(i \rightarrow j\) given by \[ p_{ij} = \frac{a_{ij}}{n_e(i)}, \] where \(n_e(i)\) is the number of edges connected to node \(i\).
\({\color{green} \rightarrow}\) note that \(\mathsf A\) is symmetric but \(\mathsf P := (p_{ij})\) is not.
\(~\)
Steady state probability distribution \(\mathbf{\boldsymbol{v}}_1 \in \mathbf R_{\geqslant 0}^n\) satisfies \[ \mathsf P^T\mathbf{\boldsymbol{v}}_1 = \mathbf{\boldsymbol{v}}_1, \qquad \mathbf{\boldsymbol{1}}^T\mathbf{\boldsymbol{v}}_1 = 1. \] (it is possible to show that \(1\) is the dominant eigenvalue)
\({\color{green} \rightarrow}\) eigenvalue problem for matrix \(\mathsf P^T\).
function adj(m)
  x = vcat([collect(0:i) for i in 0:m-1]...)
  y = vcat([fill(m-i, i) for i in 1:m]...)
  row_lengths = 1:m
  row_indices = [0; cumsum(row_lengths)]
  A = zeros(Int, length(x), length(x))
  for i in 1:m-1
      for j in 1:row_lengths[i]
          r1 = row_indices[i]
          r2 = row_indices[i+1]
          A[r2 + j, r2 + j + 1] = 1  # Horizontal
          A[r1 + j, r2 + j]     = 1  # Vertical
          A[r1 + j, r2 + j + 1] = 1  # Diagonal
      end
  end
  x, y, A + A'
endHere ... is the splat operator (?... for more info)
Steady state probability distribution satisfies \[ \mathsf P^T\mathbf{\boldsymbol{v}}_1 = \mathbf{\boldsymbol{v}}_1 \]
Strategy for finding \(\mathbf{\boldsymbol{v}}_1\): evolve probability vector until equilibrium: \[ \mathbf{\boldsymbol{x}}^{(k+1)} = \mathsf P^T\mathbf{\boldsymbol{x}}^{(k)} \qquad \Leftrightarrow \qquad x^{(k+1)}_i = \sum_{j=1}^{n_n} p_{ji} x^{(k)}_j \]
\(x^{(k)}_i\) is the probability of being at node \(i\) at iteration \(k\);
This is an example of the power iteration;
Below we take \(\mathbf{\boldsymbol{x}}^{(0)} = \mathbf{\boldsymbol{e}}_1\).
The power iteration in few lines of code:
\(~\)
Naive approach: use dense data structure for \(\mathsf P^T\)
using GFlops
flops = @count_ops power_iteration(Pᵗ, x, 10)
memory = sizeof(Pᵗ)
println(flops, "\n", "Memory: $memory bytes")Flop Counter: 882000 flop
┌─────┬─────────┐
│     │ Float64 │
├─────┼─────────┤
│ add │  441000 │
│ mul │  441000 │
└─────┴─────────┘
Memory: 352800 bytes\({\color{green} \rightarrow}\) time is wasted on 0s. Try zeros(10_000, 10_000)^2.
Better solution: use sparse data structure
import Base: *, transpose
struct coo_matrix
    rows::Array{Int}       # Row indices of ≠0
    cols::Array{Int}       # Column indices of ≠0
    vals::Array{Float64}   # Values of ≠0 entries
    m::Int                 # Number of rows
    n::Int                 # Number of columns
end
function transpose(A::coo_matrix)
    coo_matrix(A.cols, A.rows, A.vals, A.n, A.m)
end\({\color{green} \rightarrow}\) Exercise: define the * and to_coo functions
The power iteration in few lines of code:
\(~\)
Naive approach: use dense data structure for \(\mathsf P^T\)
using GFlops
flops = @count_ops power_iteration(Pᵗ, x, 10)
memory = sizeof(Pᵗ)
println(flops, "\n", "Memory: $memory bytes")Flop Counter: 882000 flop
┌─────┬─────────┐
│     │ Float64 │
├─────┼─────────┤
│ add │  441000 │
│ mul │  441000 │
└─────┴─────────┘
Memory: 352800 bytes\({\color{green} \rightarrow}\) time is wasted on 0s. Try zeros(10_000, 10_000)^2.
Better solution: use sparse data structure
using GFlops
sparse_Pᵗ = to_coo(Pᵗ)
flops = @count_ops power_iteration(sparse_Pᵗ, x, 10)
memory = sizeof(sparse_Pᵗ)
println(flops, "\n", "Memory: $memory bytes")Flop Counter: 22800 flop
┌─────┬─────────┐
│     │ Float64 │
├─────┼─────────┤
│ add │   11400 │
│ mul │   11400 │
└─────┴─────────┘
Memory: 40 bytesWe take the following matrix for illustration: \[ M = \begin{pmatrix} 5 & . & . \\ . & 6 & 7 \\ 8 & . & 9 \end{pmatrix} \]
Coordinate list (COO), presented on the previous slide;
rows is [1, 2, 2, 3, 3]
cols is [1, 2, 3, 1, 3]
vals is [5, 6, 7, 8, 9]
Compressed sparse rows (CSR):
Similar to COO, but often more memory-efficient: rows is compressed;
Vector rows contains the indices in cols and vals where rows start…
… except for rows[m+1], which is 1 + the number of nonzeros;
rows is [1, 2, 4, 6]
cols is [1, 2, 3, 1, 3]
vals is [5, 6, 7, 8, 9]
Compressed sparse columns (CSC):
Same as CSR with rows \(\leftrightarrow\) columns;
Implemented in julia by SparseArrays.
3×3 SparseMatrixCSC{Float64, Int64} with 0 stored entries:
  ⋅    ⋅    ⋅ 
  ⋅    ⋅    ⋅ 
  ⋅    ⋅    ⋅ \(~\)
\(~\)
\(~\)
See also sprand and spdiagm.
The power iteration is a method for finding \(\color{blue}(\lambda_1, v_1)\): \[ \mathbf{\boldsymbol{x}}_{k+1} = \frac{\mathsf A \mathbf{\boldsymbol{x}}_{k}}{{\lVert {\mathsf A \mathbf{\boldsymbol{x}}_{k}} \rVert}}, \]
Inf.\(~\)
Questions:
Does it converge and how quickly?
What is a good stopping criterion?
Notation: let \(\angle\) denote the acute angle between \(\mathbf{\boldsymbol{x}}\) and \(\mathbf{\boldsymbol{y}}\): \[ \angle(\mathbf{\boldsymbol{x}}, \mathbf{\boldsymbol{y}}) = \arccos\left( \frac{{\lvert {\mathbf{\boldsymbol{x}}^* \mathbf{\boldsymbol{y}}} \rvert}}{\sqrt{\mathbf{\boldsymbol{x}}^* \mathbf{\boldsymbol{x}}} \sqrt{\mathbf{\boldsymbol{y}}^* \mathbf{\boldsymbol{y}}}}\right). \]
Since the eigenvectors of \(\mathsf A\) span \(\mathbf C^n\), any vector \(\mathbf{\boldsymbol{x}}_0\) may be decomposed uniquely as \[ \mathbf{\boldsymbol{x}}_0 = \alpha_1 \mathbf{\boldsymbol{v}}_1 + \dotsb + \alpha_n \mathbf{\boldsymbol{v}}_n. \]
Proposition: convergence of the power iteration started from \(\mathbf{\boldsymbol{x}}_0\)
Suppose that \({\lvert {\lambda_1} \rvert} > {\lvert {\lambda_2} \rvert}\) and \(\alpha_1 \neq 0\). Then \((\mathbf{\boldsymbol{x}}_k)_{k \geqslant 0}\) satisfies \[ \lim_{k \to \infty} \angle(\mathbf{\boldsymbol{x}}_k, \mathbf{\boldsymbol{v}}_1) = 0. \] The sequence \((\mathbf{\boldsymbol{x}}_k)\) is said to converge essentially to \(\mathbf{\boldsymbol{v}}_1\). \(~\)
Proof
By construction \[ \mathbf{\boldsymbol{x}}_k = \frac{\lambda_1^k \alpha_1 \mathbf{\boldsymbol{v}}_1 + \dotsb + \lambda_n^k \alpha_n \mathbf{\boldsymbol{v}}_n}{{\lVert {\lambda_1^k \alpha_1 \mathbf{\boldsymbol{v}}_1 + \dotsb + \lambda_n^k \alpha_n \mathbf{\boldsymbol{v}}_n} \rVert}} = \frac{\lambda_1^k \alpha_1}{|\lambda_1^k \alpha_1|} \frac{\mathbf{\boldsymbol{v}}_1 + \frac{\lambda_2^k \alpha_2}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_2 + \dotsb + \frac{\lambda_n^k \alpha_2}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_n }{\left\lVert\mathbf{\boldsymbol{v}}_1 + \frac{\lambda_2^k \alpha_2}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_2 + \dotsb + \frac{\lambda_n^k \alpha_n}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_n\right\rVert}. \] Therefore \[ \mathbf{\boldsymbol{x}}_k \left(\frac{|\lambda_1^k \alpha_1|}{\lambda_1^k \alpha_1}\right) \xrightarrow[k \to \infty]{} \mathbf{\boldsymbol{v}}_1. \] (Notice that the bracketed factor on the left-hand side is a scalar of modulus 1.)
The dominant error term scales as \(\color{red} |\lambda_2/\lambda_1|^k\).
Suppose that \(\mu \notin \sigma(\mathsf A)\) and \(|\lambda_J - \mu| < |\lambda_K - \mu| \leqslant|\lambda_j - \mu|\) for all \(j \neq J\).
Question: can we find \(\lambda_J\), i.e. the eigenvalue that is nearest \(\mu\)?
Idea: apply the power iteration to \(\mathsf B = (\mathsf A - \mu \mathsf I)^{-1}\): \[ \mathbf{\boldsymbol{x}}_{k+1} = \frac{\mathsf B \mathbf{\boldsymbol{x}}_{k}}{{\lVert {\mathsf B \mathbf{\boldsymbol{x}}_{k}} \rVert}}. \]
The dominant eigenvalue of \(\mathsf B\) is given by \((\lambda_J - \mu)^{-1}\).
The associated eigenvector is \(\mathbf{\boldsymbol{v}}_J\).
Use \ operator instead of calculating the inverse:
A = [2 1; 1 2]
x = rand(2)
μ = 0
B = A - μ*I
for i in 1:10
    x = B\x
    x /= √(x'x)
    println("Eigenvector: $x, eigenvalue: $(x'A*x)")
endEigenvector: [-0.073395069620888, 0.9973029448243622], eigenvalue: 1.8536057618629984
Eigenvector: [-0.4840913701752976, 0.8750174542955146], eigenvalue: 1.152823203245567
Eigenvector: [-0.6363924079275455, 0.771365479608843], eigenvalue: 1.0182177300790847
Eigenvector: [-0.6841587989671516, 0.7293330774041614], eigenvalue: 1.0020407154323043
Eigenvector: [-0.6995341971283575, 0.7145991233187908], eigenvalue: 1.0002269520011233
Eigenvector: [-0.7045913752929893, 0.7096132706360092], eigenvalue: 1.0000252194328363
Eigenvector: [-0.7062692985764518, 0.7079432730723041], eigenvalue: 1.0000028021906129
Eigenvector: [-0.706827730223213, 0.7073857220692972], eigenvalue: 1.0000003113549005
Eigenvector: [-0.7070137764235103, 0.7071997737184075], eigenvalue: 1.0000000345949938
Eigenvector: [-0.7070757809576499, 0.7071377800564256], eigenvalue: 1.0000000038438883The dominant error term now scales as \(\color{red} \left|\frac{\lambda_J - \mu} {\lambda_K - \mu}\right|^k\).
Question: can we adapt \(\mu\) during the iteration in order to accelerate convergence?
Error of the inverse iteration scales as \(\color{red} \left|\frac{\lambda_J - \mu} {\lambda_K - \mu}\right|^k\).
\({\color{green} \rightarrow}\) suggests letting
\[\mu_k = \frac{\mathbf{\boldsymbol{x}}_k^* \mathsf A \mathbf{\boldsymbol{x}}_k}{\mathbf{\boldsymbol{x}}_k^* \mathbf{\boldsymbol{x}}_k}.\]
This is the Rayleigh quotient iteration.
If convergence to an eigenvector occurs, the associated eigenvalue converges cubically:
\[ |\lambda^{(k+1)} - \lambda_J| \leqslant C |\lambda^{(k)} - \lambda_J|^3. \]
\({\color{green} \rightarrow}\) Number of significant digits is roughly tripled at each iteration.
A = [2 1; 1 2]
setprecision(500)
x = BigFloat.([1.2; 0.9])
μ = 0
for i in 1:6
    x = (A - μ*I)\x
    x /= √(x'x)
    μ = x'A*x
    println(μ)
end2.689655172413793214338566074331922284628319727818488081860106227371262853189736760002654726539617400119969038165484095334893321873413803868251681795864
2.9876835222760986149331492272467355588185696109799282648595173771965397075426195063225678761434065353075142377089820027363840168362403721691362285304077
2.9999995241744513496497052218657690803207923292634754813048708145726507175594474194389978335452546331521437999768268695077710762632721314300139734193988
2.9999999999999999999730670707803381925912042248826047017208189162200256216460941091027939474791840614585910098581290345505474157692691833334644060848503
2.9999999999999999999999999999999999999999999999999999999999951158299301653110614230820165226784762374074665384291286744975203198864866424226623297764997
3.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000012A good stopping criterion is \[ \begin{equation} \tag{stopping criterion} {\lVert {\mathsf A \widehat {\mathbf{\boldsymbol{v}}} - \widehat \lambda \widehat {\mathbf{\boldsymbol{v}}}} \rVert} \leqslant\varepsilon {\lVert {\widehat {\mathbf{\boldsymbol{v}}}} \rVert} \label{stopping} \end{equation} \]
Proposition: forward error estimate
Suppose \(\mathsf A\) is Hermitian and \(\eqref{stopping}\) is satisfied. Then there exists \(\lambda \in \sigma(\mathsf A)\) such that \(|\lambda - \widehat \lambda| \leqslant\varepsilon\).
Proof
Let \(\mathbf{\boldsymbol{r}} := \mathsf A \widehat {\mathbf{\boldsymbol{v}}} - \widehat \lambda \widehat {\mathbf{\boldsymbol{v}}}\). Then \[ \frac{1}{\min_{\lambda \in \sigma(\mathsf A)} |\lambda - \widehat \lambda|} = {\lVert {(\mathsf A - \widehat \lambda \mathsf I)^{-1}} \rVert} \geqslant\frac{{\lVert {(\mathsf A - \widehat \lambda \mathsf I)^{-1}\mathbf{\boldsymbol{r}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{r}}} \rVert}} \geqslant\frac{1}{\varepsilon}, \] Taking the reciprocal, we deduce that \[ \min_{\lambda \in \sigma(\mathsf A)} |\lambda - \widehat \lambda| \leqslant\varepsilon, \] which concludes the proof.
Proposition: backward error estimate
Suppose that \(\eqref{stopping}\) is satisfied and let \(\mathcal E = \Bigl\{\mathsf E \in \mathbf C^{n \times n}: (\mathsf A + \mathsf E) \widehat {\mathbf{\boldsymbol{v}}} = \widehat \lambda \widehat {\mathbf{\boldsymbol{v}}} \Bigr\}\). Then \[ \min_{\mathsf E \in \mathcal E} {\lVert {\mathsf E} \rVert} \leqslant\varepsilon. \]
Definition of the backward error by numerical analyst Nick Higham:
Backward error is a measure of error associated with an approximate solution to a problem. Whereas the forward error is the distance between the approximate and true solutions, the backward error is how much the data must be perturbed to produce the approximate solution.
Sketch of proof
Show first that any \(\mathsf E \in \mathcal E\) satisfies \(\mathsf E \widehat {\mathbf{\boldsymbol{v}}} = - \mathbf{\boldsymbol{r}}\), where \(\mathbf{\boldsymbol{r}} := \mathsf A \widehat {\mathbf{\boldsymbol{v}}} - \widehat \lambda \widehat {\mathbf{\boldsymbol{v}}}\).
Deduce from the first item that \[ \inf_{\mathsf E \in \mathcal E} {\lVert {\mathsf E} \rVert} \geqslant\frac{{\lVert {\mathbf{\boldsymbol{r}}} \rVert}}{{\lVert {\widehat {\mathbf{\boldsymbol{v}}}} \rVert}}. \]
Find a rank one matrix \(\mathsf E_*\) such that \({\lVert {\mathsf E_*} \rVert} = \frac{{\lVert {\mathbf{\boldsymbol{r}}} \rVert}}{{\lVert {\widehat {\mathbf{\boldsymbol{v}}}} \rVert}}\), and then conclude.
(Recall that any rank 1 matrix is of the form \(\mathsf E_* = \mathbf{\boldsymbol{u}} \mathbf{\boldsymbol{w}}^*\), with norm \({\lVert {\mathbf{\boldsymbol{u}}} \rVert} {\lVert {\mathbf{\boldsymbol{w}}} \rVert}\).)
Suppose that \(\mathsf X_0 \in \mathbf C^{n \times p}\) with \(p < n\) and consider the iteration \[ \mathsf X_{k+1} = {\color{green}\text{normalization}} (\mathsf A \mathsf X_{k}) \]
In practice, this can be achieved by QR decompostion: \[ \mathsf X_{k+1} = \mathsf Q_{k+1}, \qquad \mathsf Q_{k+1} \mathsf R_{k+1} = \mathsf A \mathsf X_k \]
\(\mathsf Q_{k+1} \in \mathbf C^{n\times p}\) is a unitary matrix: \(\mathsf Q_{k+1}^* \mathsf Q_{k+1} = \mathsf I_{p \times p}\);
\(\mathsf R_{k+1} \in \mathbf C^{p\times p}\) is upper triangular.
Some remarks:
The first column undergoes a simple power iteration;
Mathematically, it is equivalent to apply the QR decomposition at the end. Indeed \[ \mathsf X_{k} \underbrace{\mathsf R_{k} \mathsf R_{k-1} \dotsc \mathsf R_1}_{\text{upper tri.}} = \mathsf A \mathsf X_{k-1} \mathsf R_{k-1} \mathsf R_{k-2} \dotsc \mathsf R_{1} = \dots = \mathsf A^k \mathsf X_0. \]
\({\color{green} \rightarrow}\) in particular, it holds that \(\mathop{\mathrm{col}}(\mathsf X_k) = \mathop{\mathrm{col}}(\mathsf A^k \mathsf X_k)\).
Theorem: Convergence of the simultaneous iteration
Assume that
\(\mathsf A \in \mathbf C^{n \times n}\) is Hermitian
\(\mathsf X_0 \in \mathbf C^{n \times p}\) has linearly independent columns
The subspace spanned by the columns of \(\mathsf X_0\) satisfies \[ \mathop{\mathrm{col}}(\mathsf X_0) \cap \mathop{\mathrm{Span}}\{ \mathbf{\boldsymbol{v}}_{p+1}, \dotsc, \mathbf{\boldsymbol{v}}_n \} = \varnothing. \]
The first \(p+1\) eigenvalues are distinct: \[ |\lambda_1| > |\lambda_2| > \dotsb > |\lambda_p| > |\lambda_{p+1}| \geqslant|\lambda_{p+2}| \geqslant\dotsc \geqslant|\lambda_n|, \] Then the columns \(\mathsf X_k\) converge essentially to \(\mathbf{\boldsymbol{v}}_1, \dots, \mathbf{\boldsymbol{v}}_p\).
Ideas of the proof:
Change coordinates so that \(\mathsf A\) is diagonal;
Show convergence of the orthogonal projector: \[ \mathsf X_{k} \mathsf X_k^* \to \mathsf V_1 \mathsf V_1^*, \qquad \mathsf V_1 := \mathsf V\texttt{[:,1:p]}. \]
Show by induction the essential convergence of each column.
Consider the following simultaneous iteration: \[ {\color{magenta} \mathsf X_{k+1} \mathsf R_{k+1} = \mathsf A \mathsf X_{k}}~\text{(QR decomposition)}, \qquad \mathsf X_0 = \mathsf I_{\color{blue} n\times n}. \] Let \(\mathsf\Lambda_k := \mathsf X_k^* \mathsf A \mathsf X_k\).
Now notice that
\(\mathsf\Lambda_{k} = \mathsf X_{k}^* ({\color{magenta}\mathsf A \mathsf X_{k}}) = \mathsf X_{k}^* ({\color{magenta} \mathsf X_{k+1} \mathsf R_{k+1}}) = {\color{green}(\mathsf X_{k}^* \mathsf X_{k+1})} \color{blue} \mathsf R_{k+1}\)
\(\mathsf\Lambda_{k+1} = ({\color{magenta}\mathsf X_{k+1}^* \mathsf A}) \mathsf X_{k+1} = ({\color{magenta} \mathsf R_{k+1} \mathsf X_{k}^*}) \mathsf X_{k+1} = {\color{blue}\mathsf R_{k+1}} {\color{green}(\mathsf X_{k}^* \mathsf X_{k+1})}\)
Therefore, the eigenvalues may be obtained as follows
2×2 Matrix{Float64}:
 3.0          5.73594e-10
 5.73594e-10  1.0This is known as the QR algorithm.
Idea
Find approximate eigenvectors in a subspace \(\mathcal U \subset \mathbf C^n\): \[ \tag{Galerkin condition} \mathsf A \mathbf{\boldsymbol{v}} \approx \lambda \mathbf{\boldsymbol{v}}, \qquad \mathbf{\boldsymbol{v}} \in \mathcal U. \]
Definition: Ritz vectors and Ritz values
A vector \(\widehat{\mathbf{\boldsymbol{v}}} \in \mathcal U\) is a Ritz vector of \(\mathsf A\) relative to \(\mathcal U\) associated with the Ritz value \(\widehat \lambda\) if \[ \mathsf A \widehat{\mathbf{\boldsymbol{v}}} - \widehat \lambda \widehat{\mathbf{\boldsymbol{v}}} \perp \mathcal U. \]
Remark: If \(\mathsf A \mathcal U \subset \mathcal U\), then every Ritz pair is an eigenpair.
Finding Ritz vectors in practice
Let \(\mathsf U \in \mathbf C^{n \times p}\) be a matrix whose columns form an orthonormal basis of \(\mathcal U\).
A pair \((\widehat {\mathbf{\boldsymbol{u}}}, \widehat \lambda) \in \mathbf C^p \times \mathbf C\) is solution of \[ \tag{Smaller eigenvalue problem} \mathsf U^* \mathsf A \mathsf U \widehat{\mathbf{\boldsymbol{u}}} = \widehat \lambda \widehat{\mathbf{\boldsymbol{u}}}, \] if and only if \((\mathsf U \widehat{\mathbf{\boldsymbol{u}}}, \widehat \lambda)\) is a Ritz pair.
General error estimate
Let \(\mathsf A\) be a full rank Hermitian matrix and \(\mathcal U\) a \(p\)-dimensional subspace of \(\mathbf C^n\). Then there are eigenvalues \(\lambda_{i_1}, \dotsc, \lambda_{i_p}\) of \(\mathsf A\) which satisfy \[ \forall j \in \{1, \dotsc, p\}, \qquad {\lvert {\lambda_{i_j} - \widehat \lambda_j} \rvert} \leqslant{\lVert {(\mathsf I - \mathsf P_{\mathcal U}) \mathsf A \mathsf P_{\mathcal U}} \rVert}, \] where \(\mathsf P_{\mathcal U} := \mathsf U \mathsf U^*\) is the orthogonal projector onto \(\mathcal U\).
Proposition: Property of Ritz values in Hermitian setting
In the Hermitian setting, it holds that \(\widehat \lambda_i \leqslant\lambda_i\).
Proof
Let \(\mathsf H := \mathsf U^* \mathsf A \mathsf U\). By the Courant–Fisher theorem, it holds that \[ \widehat \lambda_i = \max_{\mathcal S \subset \mathbf C^p, {\rm dim}(\mathcal S) = i} \left( \min_{\mathbf{\boldsymbol{x}} \in \mathcal S \backslash\{0\}} \frac{\mathbf{\boldsymbol{x}}^* \mathsf H \mathbf{\boldsymbol{x}}}{\mathbf{\boldsymbol{x}}^* \mathbf{\boldsymbol{x}}} \right) \] Letting \(\mathbf{\boldsymbol{y}} = \mathsf U \mathbf{\boldsymbol{x}}\) and then \(\mathcal R = \mathsf U \mathcal S\), we deduce that \[\begin{align*} \widehat \lambda_i &= \max_{\mathcal R \subset \mathcal U, {\rm dim}(\mathcal R) = i} \left( \min_{\mathbf{\boldsymbol{y}} \in \mathcal R \backslash\{0\}} \frac{\mathbf{\boldsymbol{y}}^* \mathsf A \mathbf{\boldsymbol{y}}}{\mathbf{\boldsymbol{y}}^* \mathbf{\boldsymbol{y}}} \right) \leqslant\max_{\mathcal R \subset \mathbf C^n, {\rm dim}(\mathcal R) = i} \left( \min_{\mathbf{\boldsymbol{y}} \in \mathcal R \backslash\{0\}} \frac{\mathbf{\boldsymbol{y}}^* \mathsf A \mathbf{\boldsymbol{y}}}{\mathbf{\boldsymbol{y}}^* \mathbf{\boldsymbol{y}}} \right) = \lambda_i, \end{align*}\] where we used the Courant–Fisher for the matrix \(\mathsf A\) in the last equality.
In particular \[ \widehat \lambda_1 = \max_{\mathbf{\boldsymbol{u}} \in \mathcal U \setminus \{0\}} \frac{\mathbf{\boldsymbol{u}}^* \mathsf A \mathbf{\boldsymbol{u}}}{\mathbf{\boldsymbol{u}}^* \mathbf{\boldsymbol{u}}} \leqslant\max_{\mathbf{\boldsymbol{u}} \in \mathbf C^n \setminus \{0\}} \frac{\mathbf{\boldsymbol{u}}^* \mathsf A \mathbf{\boldsymbol{u}}}{\mathbf{\boldsymbol{u}}^* \mathbf{\boldsymbol{u}}} = \lambda_1. \]
The Arnoldi iteration is a projection method with \(\mathcal U\) a Krylov subspace \[ \mathcal K_p(\mathsf A, \mathbf{\boldsymbol{u}}_0) = \mathop{\mathrm{Span}}\Bigl\{ \mathbf{\boldsymbol{u}}_0, \mathsf A \mathbf{\boldsymbol{u}}_0, \mathsf A^2 \mathbf{\boldsymbol{u}}_0, \dotsc, \mathsf A^{p-1} \mathbf{\boldsymbol{u}}_0 \Bigr\}. \] More precisely, it is a method for
constructing the matrix \(\mathsf U\) by Gram-Schmidt
(i.e. constructing an orthonormal basis of \(\mathcal K_k\))
calculating the reduced matrix \(\mathsf H := \mathsf U^* \mathsf A \mathsf U\).
Remarks:
\(~\)
Computational cost scales as \(p^2\).
\({\color{green} \rightarrow}\) Expensive for large \(p\).
\(~\)
Sometimes useful to restart iteration, e.g. when calculating dominant eigenpair:
Perform \(p\) iterations of Arnoldi;
Calculate the associated Ritz vector \(\widehat {\mathbf{\boldsymbol{v}}}_1\);
Repeat with \(\mathbf{\boldsymbol{u}}_0 = \widehat {\mathbf{\boldsymbol{v}}}_1\).
Eigenvalues of large modulus tend to converge faster;
But this observation still lacks a strong theoretical footing…
n, p = 40, 5
θ = LinRange(0, 6π, n)
A = (@. θ*cos(θ) + im*θ*sin(θ)) |> diagm
u₀ = rand(n)
true_λs = eigvals(A)
xlims = extrema(real.(true_λs)) .+ [-2; 2]
ylims = extrema(imag.(true_λs)) .+ [-2; 2]
anim = @animate for p ∈ 1:n
    H, U = arnoldi(u₀, A, p)
    λs = eigvals(H)
    heatmap(LinRange(xlims..., 100), LinRange(ylims..., 100), (x, y) -> √(x^2 + y^2), c=:viridis, alpha=.5, colorbar=:none)
    scatter!(real.(true_λs), imag.(true_λs), shape=:plus, color=:black, ms=6)
    scatter!(real.(λs), imag.(λs), label="Arnoldi")
    plot!(xlims=xlims, ylims=ylims, legend=:none, aspect_ratio=:equal, size=(900, 900))
    annotate!(-15, -17, "p = $p")
end
gif(anim, "arnoldi.gif", fps=2)The Lanczos iteration is a particular case of Arnoldi’s method when \(\mathsf A\) is Hermitian. \[ \mathcal K_p = \mathop{\mathrm{Span}}\Bigl\{ \mathbf{\boldsymbol{u}}_0, \mathsf A \mathbf{\boldsymbol{u}}_0, \mathsf A^2 \mathbf{\boldsymbol{u}}_0, \dotsc, \mathsf A^{p-1} \mathbf{\boldsymbol{u}}_0 \Bigr\}. \]
Question: which \(\mathbf{\boldsymbol{v}}\) gives a better approximation \(\frac{\mathbf{\boldsymbol{v}}^* \mathsf A \mathbf{\boldsymbol{v}}}{\mathbf{\boldsymbol{v}}^* \mathbf{\boldsymbol{v}}}\) of the dominant eigenvalue of \(\mathsf A\)?
Power iteration: \(\mathbf{\boldsymbol{v}} = \mathsf A^{p-1} \mathbf{\boldsymbol{x}}_0\).
Lanczos iteration: \(\mathbf{\boldsymbol{v}} = \widehat {\mathbf{\boldsymbol{v}}}_1\) the Ritz vector relative to \(\mathcal K_{p}\) associated with the dominant Ritz value.