Solving linear systems numerically

Cours ENPC — Pratique du calcul scientifique

Objective

Goal of this chapter

Study numerical methods for the linear equation \[ \mathsf A \mathbf{\boldsymbol{x}} = \mathbf{\boldsymbol{b}}, \] where $\mathsf A \in \mathbf C^{n \times n}$ and $b \in \mathbf C^n.$

Two classes of methods:

Direct methods:
- LU decomposition (for general invertible matrices);
- Cholesky decomposition (for symmetric positive definite matrices)

Iterative methods:
- Basic iterative methods based on a splitting;
- Conjugate gradients
- And many more: GMRES, BiCGStab, etc.

Before discussing these methods, we introduce the concept of conditioning.

Conditioning of linear systems

Assume that we wish to calculate numerically the solution $(1, -1)^T$ of \[ \mathsf A \mathbf{\boldsymbol{x}} := \begin{pmatrix} 1 & 1 \\ 1 & 1 - 10^{-12} \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 0 \\ 10^{-12} \end{pmatrix} =: \mathbf{\boldsymbol{b}}. \]

In Julia, we can calculate the solution with the \ operator

A = [1 1; 1 (1-1e-12)]
b = [0; 1e-12]
x = A\b

2-element Vector{Float64}:
  1.0000221222095027
 -1.0000221222095027

Why is the relative error much larger than the machine epsilon?

Julia solves not $\mathsf A \mathbf{\boldsymbol{x}} = \mathbf{\boldsymbol{b}}$ but \[(\mathsf A + \Delta \mathsf A) (\mathbf{\boldsymbol{x}} + \Delta \mathbf{\boldsymbol{x}}) = (\mathbf{\boldsymbol{b}} + \Delta \mathbf{\boldsymbol{b}})\] where $\Delta \mathsf A$ and $\Delta \mathbf{\boldsymbol{b}}$ are roundoff errors, and $\Delta \mathbf{\boldsymbol{x}}$ is the resulting perturbation of the solution.

Question: Can we estimate $\frac{{\lVert {\Delta \mathbf{\boldsymbol{x}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}} \rVert}}$ in terms of $\frac{{\lVert {\Delta \mathsf A} \rVert}}{{\lVert {\mathsf A} \rVert}}$ and $\frac{{\lVert {\Delta \mathbf{\boldsymbol{b}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{b}}} \rVert}}$?

Recalls: vector and matrix norms (1/3)

Given $p \in [1, \infty]$, the $p$-norm of a vector $\mathbf{\boldsymbol{x}} \in \mathbf C^n$ is defined as follows: \[ {\lVert {\mathbf{\boldsymbol{x}}} \rVert}_p := \begin{cases} \left( \sum_{i=1}^{n} {\lvert {x_i} \rvert}^p \right)^{\frac{1}{p}} & \text{if $p < \infty$}, \\ \max \Bigl\{ {\lvert {x_1} \rvert}, \dotsc, {\lvert {x_n} \rvert} \Bigr\} & \text{if $p = \infty$}. \end{cases} \]

Code

Plots.plot(aspect_ratio=:equal, xlims=(-1.1,1.1), ylims=(-1.1, 1.1))
Plots.plot!(framestyle=:origin, grid=true, legend=:outertopright)
Plots.plot!(title="Unit circle in different norms")
Plots.plot!([0, 1, 0, -1, 0], [-1, 0, 1, 0, -1], label=L"$p = 1$ (taxicab norm)")
Plots.plot!([1, 1, -1, -1, 1], [1, -1, -1, 1, 1], label=L"p = \infty")
Plots.plot!(t -> cos(t), t -> sin(t), 0, 2π, label=L"$p = 2$ (Euclidean norm)")

Recalls: vector and matrix norms (2/3)

The operator norm on matrices induced by the vector $p$-norm is given by \[ {\lVert {\mathsf A} \rVert}_{p} := \sup_{{\lVert {\mathbf{\boldsymbol{x}}} \rVert}_{p} \leqslant 1} {\lVert {\mathsf A \mathbf{\boldsymbol{x}}} \rVert}_{p} \] It follows from the definition that ${\lVert {\mathsf A \mathsf B} \rVert}_p \leqslant{\lVert {\mathsf A} \rVert}_p {\lVert {\mathsf B} \rVert}_p$.

Exercises

Show that

The matrix 2-norm is given by $\sqrt{\lambda_{\rm max}(\mathsf A^* \mathsf A)}$.
The matrix 1-norm is the maximum absolute column sum:

\[{\lVert {\mathsf A} \rVert}_1 = \max_{1 \leqslant j \leqslant n} \sum_{i=1}^{n} {\lvert {a_{ij}} \rvert}.\]
The matrix $\infty$-norm is the maximum absolute row sum:

\[{\lVert {\mathsf A} \rVert}_{\infty} = \max_{1 \leqslant i \leqslant n} \sum_{j=1}^{n} {\lvert {a_{ij}} \rvert}.\]

Recalls: vector and matrix norms (3/3)

From matrix norms, we define the condition number of a matrix as \[ \kappa_p(\mathsf A) = {\lVert {\mathsf A} \rVert}_p {\lVert {\mathsf A^{-1}} \rVert}_p \]

Properties:

$\kappa_p(\mathsf I) = 1$
$\kappa_p(\mathsf A) \geqslant 1$
$\kappa_p(\alpha \mathsf A) = \kappa_p(\mathsf A)$

In Julia, calculate the condition number using cond(A, p=2).

Conditioning of linear systems

Proposition

Let $\mathbf{\boldsymbol{x}} + \Delta \mathbf{\boldsymbol{x}}$ denote the solution to \[ \mathsf A (\mathbf{\boldsymbol{x}} + \Delta \mathbf{\boldsymbol{x}}) = \mathbf{\boldsymbol{b}} + \Delta \mathbf{\boldsymbol{b}} \] The following inequality holds: \[ \frac{{\lVert {\Delta \mathbf{\boldsymbol{x}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}} \rVert}} \leqslant\kappa(\mathsf A) \frac{{\lVert {\Delta \mathbf{\boldsymbol{b}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{b}}} \rVert}} \]

Proof

It holds by definition of $\Delta \mathbf{\boldsymbol{x}}$ that $\mathsf A \Delta \mathbf{\boldsymbol{x}} = \Delta \mathbf{\boldsymbol{b}}$. Therefore \[ \begin{aligned} {\lVert {\Delta \mathbf{\boldsymbol{x}}} \rVert} &= {\lVert {\mathsf A^{-1} \Delta \mathbf{\boldsymbol{b}}} \rVert} \leqslant{\lVert {\mathsf A^{-1}} \rVert} {\lVert {\Delta \mathbf{\boldsymbol{b}}} \rVert} \\ &= \frac{{\lVert {\mathsf A \mathbf{\boldsymbol{x}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{b}}} \rVert}} {\lVert {\mathsf A^{-1}} \rVert} {\lVert {\Delta \mathbf{\boldsymbol{b}}} \rVert} \leqslant\frac{{\lVert {\mathsf A} \rVert} {\lVert {\mathbf{\boldsymbol{x}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{b}}} \rVert}} {\lVert {\mathsf A^{-1}} \rVert} {\lVert {\Delta \mathbf{\boldsymbol{b}}} \rVert}. \end{aligned} \] Rearranging, we obtain the statement.

Proposition

Let $\mathbf{\boldsymbol{x}} + \Delta \mathbf{\boldsymbol{x}}$ denote the solution to \[ (\mathsf A + \Delta \mathsf A) (\mathbf{\boldsymbol{x}} + \Delta \mathbf{\boldsymbol{x}}) = \mathbf{\boldsymbol{b}} \] If $\mathsf A$ is invertible and ${\lVert {\Delta \mathsf A} \rVert} < \frac{1}{2} {\lVert {\mathsf A^{-1}} \rVert}^{-1}$, then \[ \frac{{\lVert {\Delta \mathbf{\boldsymbol{x}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}} \rVert}} \leqslant 2\kappa(\mathsf A) \frac{{\lVert {\Delta \mathsf A} \rVert}}{{\lVert {\mathsf A} \rVert}} \]

Conclusion: $\kappa(\mathsf A)$ measures sensitivity to perturbations:

useful to estimate the impact of round-off errors;
influences convergence speed of methods (see later).
When $\kappa_p(\mathsf A) \gg 1$, the system is called ill-conditioned.

Direct methods

First calculate the $\mathsf L \mathsf U$ decomposition of $\mathsf A$ with
- $\mathsf U$ upper triangular matrix;
- $\mathsf L$ unit lower triangular.

Then solve $\mathsf L \mathbf{\boldsymbol{y}} = \mathbf{\boldsymbol{b}}$ using forward substitution.

y = copy(b)
for i in 2:n
    for j in 1:i-1
        y[i] -= L[i, j] * y[j]
    end
end

Finally, solve $\mathsf U \mathbf{\boldsymbol{x}} = \mathbf{\boldsymbol{y}}$ using backward substitution.

Remark: The LU decomposition may not exist

In practice, use decomposition $\mathsf P \mathsf A = \mathsf L \mathsf U$, with $\mathsf P$ a permutation matrix.

Guaranteed to exist if $\mathsf A$ is invertible;
More numerically stable.

LU decomposition by Gaussian elimination

\[ \underbrace{ \begin{pmatrix} 2 & 1 & -1 \\ -3 & -1 & 2 \\ -2 & 1 & 2 \end{pmatrix} }_{\mathsf A} \]

\[ \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ {\color{red}\frac{3}{2}} & 1 & 0 \\ {\color{red}{1}} & 0 & 1 \end{pmatrix} }_{=: \, \mathsf M_1}~ \underbrace{ \begin{pmatrix} 2 & 1 & -1 \\ -3 & -1 & 2 \\ -2 & 1 & 2 \end{pmatrix} }_{\mathsf A} = \begin{pmatrix} 2 & 1 & -1 \\ {\color{red} 0} & \frac{1}{2} & \frac{1}{2} \\ {\color{red} 0} & 2 & 1 \end{pmatrix} \]

\[ \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & {\color{blue} -4} & 1 \end{pmatrix} }_{=: \, \mathsf M_2}~ \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ {\color{red}\frac{3}{2}} & 1 & 0 \\ {\color{red}{1}} & 0 & 1 \end{pmatrix} }_{=: \, \mathsf M_1}~ \underbrace{ \begin{pmatrix} 2 & 1 & -1 \\ -3 & -1 & 2 \\ -2 & 1 & 2 \end{pmatrix} }_{\mathsf A} = \underbrace{ \begin{pmatrix} 2 & 1 & -1 \\ {\color{red} 0} & \frac{1}{2} & \frac{1}{2} \\ {\color{red} 0} & {\color{blue} 0} & -1 \end{pmatrix} }_{=: \, \mathsf U} \]

Gaussian transformations $\mathsf M_1$, $\mathsf M_2$ are simple to invert and multiply. In particular \[ \mathsf A = \mathsf M_1^{-1} \mathsf M_2^{-1} \mathsf U = \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ {\color{red}-\frac{3}{2}} & 1 & 0 \\ {\color{red}{-1}} & 0 & 1 \end{pmatrix} }_{=: \, \mathsf M_1^{-1}}~ \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & {\color{blue} 4} & 1 \end{pmatrix} }_{=: \, \mathsf M_2^{-1}}~ \mathsf U = \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ {\color{red}-\frac{3}{2}} & 1 & 0 \\ {\color{red}{-1}} & {\color{blue} 4} & 1 \end{pmatrix} }_{\mathsf M_1^{-1} \mathsf M_2^{-1} =: \, \mathsf L}~ \mathsf U \]

LU decomposition: computational cost

# A is an invertible matrix of size n x n
L = [r == c ? 1.0 : 0.0 for r in 1:n, c in 1:n]
U = copy(A)
for c in 1:n-1, r in c+1:n
   U[c, c] == 0 && error("Pivotal entry is zero!")
   L[r, c] = U[r, c] / U[c, c]
   U[r, c:end] -= U[c, c:end] * L[r, c]
end
# L is unit lower triangular and U is upper triangular

$~$

Computational cost: $\frac{2}{3} n^3 + \mathcal O(n^2)$ floating point operations (flops);
In comparison, the computational cost of forward/backward substitution scales as $\mathcal O(n^2)$;
The LU decomposition can be reused for different right-hand sides;
If $\mathsf A$ is a banded matrix, so are $\mathsf L$ and $\mathsf U$.
$\mathsf L$ and $\mathsf U$ are not necessarily sparse when $\mathsf A$ is sparse.

Direct method for symmetric positive definite $\mathsf A$

In this case, it is more efficient to calculate the Cholesky factorization:

\[ \mathsf A = \mathsf C \mathsf C^* \]

with $\mathsf C$ lower triangular with positive diagonal elements.

$~$

Exercise:

Write an algorithm for calculating $\mathsf C$ by identification of $\mathsf A$ and $\mathsf C \mathsf C^*$.
What is the computational complexity of your implementation?
Bonus points if your implementation is efficient on banded matrices.

Iterative methods

Motivation: General-purpose direct methods are

exact up to roundoff errors
but computationally expensive: $\mathcal O(n^3)$ flops for full matrices…
Additionally, they require the storage of $\mathsf L$ and $\mathsf U$.

$~$

In contrast, iterative methods

are usually approximate (but there are exceptions);
usually have a cost per iteration scaling as $\mathcal O(n^2)$ at most;

$\rightarrow$ often computationally more economical
can be stopped at any point when the residual $\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}$ is sufficiently small;

$~$

In this section, we focus on basic iterative methods based on a splitting

\[ \mathsf A = \mathsf M - \mathsf N \]

A first iterative method

Richardson’s method

\[\begin{align*} \mathbf{\boldsymbol{x}}^{(k + 1)} = \mathbf{\boldsymbol{x}}^{(k)} + \omega (\mathbf{\boldsymbol{b}} - \mathsf A \mathbf{\boldsymbol{x}}^{(k)}) \end{align*}\]

Proposition

Assume $\omega \neq 0$. If $(\mathbf{\boldsymbol{x}}^{(k)})$ converges in the iteration above, then it converges towards the solution of \[\begin{align*} \mathsf A \mathbf{\boldsymbol{x}} = \mathbf{\boldsymbol{b}}. \end{align*}\]

Questions:

Does this method converge and how quickly?
What is a good stopping criterion?

Spectral radius and Gelfand’s lemma

Definition

The spectral radius of $\mathsf A \in \mathbf C^{n \times n}$ is given by \[ \rho(\mathsf A) := \max_{\lambda \in \mathop{\mathrm{spectrum}}A} {\lvert {\lambda} \rvert} \]

Remarks:

$\rho(A)$ is not a norm;
$\rho(A) \leqslant{\lVert {\mathsf A} \rVert}$ for any induced matrix norm.

Theorem: Gelfand’s formula

For any $\mathsf A \in \mathbf C^{n \times n}$ and any matrix norm ${\lVert {\cdot} \rVert}$, \[ \lim_{k \to \infty} {\lVert {\mathsf A^k} \rVert}^{1/k} = \rho(\mathsf A). \]

In particular ${\lVert {\mathsf A^k} \rVert} \to 0$ as $k \to \infty$ iff $\rho(\mathsf A) < 1$.
The smaller $\rho(\mathsf A)$, the faster the convergence.

Analysis of Richardson’s method

\[\begin{align*} \mathbf{\boldsymbol{x}}^{(k + 1)} = \mathbf{\boldsymbol{x}}^{(k)} + \omega (\mathbf{\boldsymbol{b}} - \mathsf A \mathbf{\boldsymbol{x}}^{(k)}) \end{align*}\]

Proposition

The Richardson iteration converges to the real solution for every choice of $\mathbf{\boldsymbol{x}}^{(0)}$ if and only if \[\begin{align*} \rho(\mathsf I - \omega \mathsf A) = \max_{\lambda \in \mathop{\mathrm{spectrum}}A} {\lvert {1 - \omega \lambda} \rvert} < 1. \end{align*}\]

Proof

We notice that \[ \mathbf{\boldsymbol{x}}^{(k+1)} - \mathbf{\boldsymbol{x}}_* = \mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_* + \omega (\mathsf A \mathbf{\boldsymbol{x}}_* - \mathsf A \mathbf{\boldsymbol{x}}^{(k)}) = (\mathsf I - \omega \mathsf A) \left( \mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_* \right). \] Therefore \[ \mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_* = (\mathsf I - \omega \mathsf A)^k \left( \mathbf{\boldsymbol{x}}^{(0)} - \mathbf{\boldsymbol{x}}_* \right) \] The “only if” part of the statement is clear, and the “if” part follows from the equivalence \[ {\lVert {\mathsf B^k} \rVert} \xrightarrow[k \to \infty]{} 0 \quad \Leftrightarrow \quad \rho(B) < 1. \]

Corollary: It is necessary for convergence that all the eigenvalues have nonzero real parts of the same sign.

Analysis of Richardson’s method

By Gelfand’s formula, for all $\varepsilon > 0$ there is $K \in \mathbf N$ such that \[ \forall k \geqslant K, \qquad {\lVert {\mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_*} \rVert} \leqslant\bigl( \rho(\mathsf I - \omega \mathsf A) + \varepsilon \bigr)^k {\lVert {\mathbf{\boldsymbol{x}}^{(0)} - \mathbf{\boldsymbol{x}}_*} \rVert}. \]

We consider the particular case where $\mathsf A \in \mathbf R^{n \times n}$ is symmetric positive definite.

Remark: link to optimization

When $\mathsf A$ is Hermitian positive definite, the solution to $\mathsf A \mathbf{\boldsymbol{x}} = \mathbf{\boldsymbol{b}}$ is the minimizer \[ f(\mathbf{\boldsymbol{x}}) = \frac{1}{2} \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}^T\mathbf{\boldsymbol{x}}. \] In this case, Richardson’s iteration may be rewritten as \[ \mathbf{\boldsymbol{x}}^{(k+1)} = \mathbf{\boldsymbol{x}}^{(k)} - \omega \nabla f(\mathbf{\boldsymbol{x}}^{(k)}). \]

$~$

Exercise: find $\omega$ that minimizes $\rho(\mathsf I - \omega \mathsf A)$

Simple calculations lead to \[ \omega_* = \frac{2}{\lambda_{\min}(\mathsf A) + \lambda_{\max}(\mathsf A)}, \qquad \rho(\mathsf I - \omega_* \mathsf A) = \frac{\kappa_2(\mathsf A) - 1}{\kappa_2(\mathsf A) + 1} \]

Illustration of Richardson’s method

Consider the linear system \[ \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 7 \\ 8 \end{pmatrix} \]

The exact solution is $(2, 3)^T$;
The eigenvalues of $\mathsf A$ are 1 and 3;
The condition number $\kappa_2(\mathsf A)$ is 3;
The optimal $\omega$ is $\frac{1}{2}$.

Contour plot of \[f(\mathbf{\boldsymbol{x}}) = \frac{1}{2} \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}^T\mathbf{\boldsymbol{x}}.\]

Illustration of Richardson’s method: $\omega = .2$

Illustration of Richardson’s method: $\omega = .6$

Illustration of Richardson’s method: optimal $\omega = .5$

More general basic iterative methods

Consider the splitting \[ \mathsf A = {\color{green}\mathsf M} - {\color{blue} \mathsf N} \] where $\mathsf M$ is invertible and easy to invert. Then \[ \mathsf A \mathbf{\boldsymbol{x}}_* = \mathbf{\boldsymbol{b}} \quad \Leftrightarrow \quad {\color{green}\mathsf M} \mathbf{\boldsymbol{x}}_* = {\color{blue} \mathsf N} \mathbf{\boldsymbol{x}}_* + \mathbf{\boldsymbol{b}} \] which suggests the iterative method \[ {\color{green}\mathsf M} \mathbf{\boldsymbol{x}}^{(k+1)} = {\color{blue} \mathsf N} \mathbf{\boldsymbol{x}}^{(k)} + \mathbf{\boldsymbol{b}} \]

Standard basic iterative methods (here $\mathsf A = \mathsf L + \mathsf D + \mathsf U$)

Richardson’s iteration corresponds to the particular splitting

\[ \mathsf A = \underbrace{{\color{green}\frac{1}{\omega} \mathsf I }}_{\mathsf M} - \underbrace{\left({\color{blue} \frac{1}{\omega} \mathsf I - \mathsf A}\right)}_{\mathsf N} \]

Jacobi’s iteration: $\mathsf A = {\color{green}\mathsf D} - ({\color{blue}-\mathsf L - \mathsf U})$.
Gauss Seidel’s iteration: $\mathsf A = ({\color{green}\mathsf D + \mathsf L}) - ({\color{blue}- \mathsf U})$
Relaxation method: $\mathsf A = \left({\color{green}\frac{\mathsf D}{\omega} + \mathsf L }\right) - \left({\color{blue}\frac{1 - \omega}{\omega} \mathsf D - \mathsf U} \right)$. $~$(Coincides with Gauss-Seidel when $\omega = 1$)

Convergence of general basic iterative method

Proposition: convergence of the splitting method

The iteration \[ {\color{green}\mathsf M} \mathbf{\boldsymbol{x}}^{(k+1)} = {\color{blue} \mathsf N} \mathbf{\boldsymbol{x}}^{(k)} + \mathbf{\boldsymbol{b}} \] converges for any initial $\mathbf{\boldsymbol{x}}^{(0)}$ if and only if $\rho({\color{green}\mathsf M}^{-1} {\color{blue} \mathsf N}) < 1$.
In addition, for any $\varepsilon > 0$ there exists $K > 0$ such that \[ \forall k \geqslant K, \qquad {\lVert {\mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_*} \rVert} \leqslant\bigl(\rho({\color{green}\mathsf M}^{-1} {\color{blue} \mathsf N}) + \varepsilon\bigr)^k {\lVert {\mathbf{\boldsymbol{x}}^{(0)} - \mathbf{\boldsymbol{x}}_*} \rVert}. \]

Proof

Subtracting the equations \[ \begin{aligned} {\color{green}\mathsf M} \mathbf{\boldsymbol{x}}^{(k+1)} &= {\color{blue} \mathsf N} \mathbf{\boldsymbol{x}}^{(k)} + \mathbf{\boldsymbol{b}} \\ {\color{green}\mathsf M} \mathbf{\boldsymbol{x}}_* &= {\color{blue} \mathsf N} \mathbf{\boldsymbol{x}}_* + \mathbf{\boldsymbol{b}} \end{aligned} \] and rearranging gives \[ \mathbf{\boldsymbol{x}}^{(k+1)} - \mathbf{\boldsymbol{x}}_* = {\color{green}\mathsf M}^{-1} {\color{blue} \mathsf N} (\mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_*) = \dots = ({\color{green}\mathsf M}^{-1} {\color{blue} \mathsf N})^{k+1} (\mathbf{\boldsymbol{x}}^{(0)} - \mathbf{\boldsymbol{x}}_*). \] The “only if” part of the first item is simple. The other claims follow from Gelfand’s formula: \[ \forall \mathsf B \in \mathbf C^{n \times n}, \qquad \lim_{k \to \infty} {\lVert {\mathsf B^k} \rVert}^{1/k} = \rho(\mathsf B). \]

Concrete convergence guarantees

Settings where $\rho(\mathsf M^{-1} \mathsf N) < 1$ are identified on a case by case basis:

Richardson’s iteration, sufficient condition: $\mathsf A$ symmetric positive definite;
Jacobi’s iteration, sufficient condition: $\mathsf A$ strictly row or column diagonally dominant:

\[ \lvert a_{ii} \rvert > \sum_{j \neq i} \lvert a_{ij} \rvert \quad \forall i \qquad \text{ or } \qquad \lvert a_{jj} \rvert > \sum_{i \neq j} \lvert a_{ij} \rvert \quad \forall j. \]

Gauss Seidel’s iteration, sufficient condition: $\mathsf A$ strictly row or column diagonally dominant;
Relaxation method, sufficient condition: $\mathsf A$ symmetric positive definite and $\omega \in (0, 2)$;
Relaxation method, necessary condition: $\omega \in (0, 2)$;
…

Improving upon Richardson’s iteration

Suppose again that $\mathsf A$ is symmetric positive definite and recall Richardson’s iteration:

\[\begin{align*} \mathbf{\boldsymbol{x}}^{(k + 1)} = \mathbf{\boldsymbol{x}}^{(k)} - \omega \nabla f(\mathbf{\boldsymbol{x}}^{(k)}), \qquad f(\mathbf{\boldsymbol{x}}) := \frac{1}{2} \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}^T\mathbf{\boldsymbol{x}}. \end{align*}\]

Proposition

For any $\mathbf{\boldsymbol{d}} \in \mathbf R^n$, it holds that \[ \mathop{\mathrm{arg\,min}}_{\omega \in \mathbf R} f \left(\mathbf{\boldsymbol{x}} - \omega \mathbf{\boldsymbol{d}} \right) = \frac{\mathbf{\boldsymbol{d}}^T\mathbf{\boldsymbol{r}}}{\mathbf{\boldsymbol{d}}^T\mathsf A \mathbf{\boldsymbol{d}}}, \qquad \mathbf{\boldsymbol{r}} = \mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}. \]

We can improve upon Richardson’s iteration by letting \[ \mathbf{\boldsymbol{x}}^{(k + 1)} = \mathbf{\boldsymbol{x}}^{(k)} - \omega_{\color{red}k} \nabla f(\mathbf{\boldsymbol{x}}^{(k)}), \qquad \omega_{\color{red}k} := \frac{\mathbf{\boldsymbol{d}}_k^T\mathbf{\boldsymbol{d}}_k}{\mathbf{\boldsymbol{d}}_k^T\mathsf A \mathbf{\boldsymbol{d}}_k}, \qquad \mathbf{\boldsymbol{d}}_k := \mathsf A \mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{b}}. \]

This is the steepest descent method with optimal step.

Illustration of steepest descent

Further improvement: the conjugate directions method

First note that \[ f(\mathbf{\boldsymbol{x}}) = \frac{1}{2} \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}^T\mathbf{\boldsymbol{x}} = \frac{1}{2} {\lVert {\mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{x}}_*} \rVert}_{\mathsf A}^2 + \text{constant}, \qquad {\lVert {\mathbf{\boldsymbol{y}}} \rVert}_{\mathsf A} := \sqrt{{\langle {\mathbf{\boldsymbol{y}}, \mathbf{\boldsymbol{y}}} \rangle}_{\mathsf A}}, \qquad {\langle {\mathbf{\boldsymbol{x}}, \mathbf{\boldsymbol{y}}} \rangle}_{\mathsf A} := \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{y}}. \] Given $n$ conjugate directions $\{\mathbf{\boldsymbol{d}}_0, \dotsc, \mathbf{\boldsymbol{d}}_{n-1}\}$ such that ${\langle {\mathbf{\boldsymbol{d}}_i, \mathbf{\boldsymbol{d}}_j} \rangle}_{\mathsf A} = \delta_{ij}$, we have \[ \mathbf{\boldsymbol{x}}_* - \mathbf{\boldsymbol{x}}^{(0)} = \sum_{i=0}^{n-1} \mathbf{\boldsymbol{d}}_i \mathbf{\boldsymbol{d}}_i^T\mathsf A (\mathbf{\boldsymbol{x}}_* - \mathbf{\boldsymbol{x}}^{(0)}) = \sum_{i=0}^{n-1} \mathbf{\boldsymbol{d}}_i \mathbf{\boldsymbol{d}}_i^T({\color{green}\mathbf{\boldsymbol{b}}} - \mathsf A \mathbf{\boldsymbol{x}}^{(0)}) \]

$\color{green}\rightarrow$ the $\mathsf A$-projections of $\mathbf{\boldsymbol{x}}_* - \mathbf{\boldsymbol{x}}^{(0)}$ can be calculated even though $\mathbf{\boldsymbol{x}}_*$ is unknown!

# Assume `ds` is a list of n conjugate directions
x = zeros(n)
for k in 1:n
    d = ds[k]
    r = A*x - b
    ω = d'r / (d'A*d)  # Denominator = 1 if `d` is normalized
    x -= ω * d
end

Minimization property: by construction,

$\mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}^{(0)}$ is the $\mathsf A$-orthogonal projection of $\mathbf{\boldsymbol{x}}_* - \mathbf{\boldsymbol{x}}^{(0)}$ onto $\mathop{\mathrm{Span}}\{\mathbf{\boldsymbol{d}}_0, \dotsc, \mathbf{\boldsymbol{d}}_{k-1}\}$.
Therefore, $\mathbf{\boldsymbol{x}}^{(k)}$ minimizes $f(\mathbf{\boldsymbol{x}}^{(k)})$ over the full affine subspace $\mathbf{\boldsymbol{x}}^{(0)} + \mathop{\mathrm{Span}}\{\mathbf{\boldsymbol{d}}_0, \dotsc, \mathbf{\boldsymbol{d}}_{k-1}\}$.

Conjugate directions: example

Consider again the linear system \[ \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 7 \\ 8 \end{pmatrix} \]
(non-normalized) Conjugate directions calculated by Gram-Schmidt: \[ \mathbf{\boldsymbol{d}}_0 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \qquad \mathbf{\boldsymbol{d}}_1 = \begin{pmatrix} -1 \\ 2 \end{pmatrix}. \]

using LinearAlgebra
n = 2
A = [2.0 1.0; 1.0 2.0]
b = [7.0; 8.0]
x = zeros(n)
ds = [[1.0; 0.0], [-1.0; 2.0]]
for i in 1:n
    d = ds[i]
    r = A*x - b
    ω = d'r / (d'A*d)  # Denominator useful if `d` is not normalized
    x -= ω*d
    println("Residual: $(norm(A*x - b))")
end

Residual: 4.5
Residual: 0.0

Conjugate directions: illustration

In general, convergence in at most $n$ iterations;

Question: how to generate conjugate directions ?

${\color{green} \rightarrow}$ the conjugate gradient method enables to generate conjugate directions on the fly.

Conjugate gradient method

Idea: generate conjugate direction $\mathbf{\boldsymbol{d}}_k$ by applying Gram-Schmidt to $\mathbf{\boldsymbol{r}}^{(0)}, \mathbf{\boldsymbol{r}}^{(1)}, \dotsc, \mathbf{\boldsymbol{r}}^{(k)}$.

Cost of calculating a new direction $\mathbf{\boldsymbol{d}}_k$ seemingly scales as $\mathcal O(k)$, but in fact $\color{green} \mathcal O(1)$.

Convergence: exact in $n$ iterations, and otherwise faster than Richardson’s iteration:

Theorem

Suppose that $\mathsf A$ is symmetric positive definite. Then \[ \forall k \geqslant 0, \qquad {\lVert {\mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_*} \rVert}_{\mathsf A} \leqslant 2 \left( \frac{{\color{red}\sqrt{\kappa_2(\mathsf A)}} - 1}{{\color{red}\sqrt{\kappa_2(\mathsf A)}} + 1} \right)^{k} {\lVert {\mathbf{\boldsymbol{x}}^{(0)} - \mathbf{\boldsymbol{x}}_*} \rVert}_{\mathsf A}, \]

Overview of other Krylov subspace methods

Many iterative methods used at present are based on Krylov subspaces: \[ \mathcal K_k(\mathsf A, \mathbf{\boldsymbol{b}}) = \mathop{\mathrm{Span}}\Bigl\{ \mathbf{\boldsymbol{b}}, \mathsf A \mathbf{\boldsymbol{b}}, \mathsf A^2 \mathbf{\boldsymbol{b}}, \dotsc, \mathsf A^{k-1} \mathbf{\boldsymbol{b}} \Bigr\}. \]

Conjugate gradient method (only for symmetric positive definite $\mathsf A$): \[x^{(k)} = \mathop{\mathrm{arg\,min}}_{\mathbf{\boldsymbol{x}} \in \mathcal K_{k}} {\lVert {\mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{x}}_*} \rVert}_{\mathsf A}, \qquad {\lVert {\mathbf{\boldsymbol{y}}} \rVert}_{\mathsf A} := \sqrt{{\langle {\mathbf{\boldsymbol{y}}, \mathbf{\boldsymbol{y}}} \rangle}_{\mathsf A}}, \qquad {\langle {\mathbf{\boldsymbol{x}}, \mathbf{\boldsymbol{y}}} \rangle}_{\mathsf A} := \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{y}}.\] Key idea: Although $\mathbf{\boldsymbol{x}}_*$ is unknown, its orthogonal projections for the inner product ${\langle {\cdot, \cdot} \rangle}_{\mathsf A}$ can be calculated: \[{\langle {\mathbf{\boldsymbol{d}}, \mathbf{\boldsymbol{x}}_*} \rangle}_{\mathsf A} = {\langle {\mathbf{\boldsymbol{d}}, \mathsf A \mathbf{\boldsymbol{x}}_*} \rangle} = \mathbf{\boldsymbol{d}}^T\mathbf{\boldsymbol{b}}.\]

Full orthogonalization method (FOM): find $\mathbf{\boldsymbol{x}}^{(k)}$ in $\mathcal K_k$ such that the following Galerkin condition is satisfied:

\[{\langle {\mathsf A \mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{b}}, \mathcal K_k} \rangle} = 0.\]

Iterates coincide with conjugate gradient method when $\mathsf A$ is symmetric positive definite.

Generalized Minimal Residual Method (GMRES) \[x^{(k)} = \mathop{\mathrm{arg\,min}}_{\mathbf{\boldsymbol{x}} \in \mathcal K_{k}} {\lVert {\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}} \rVert}.\]

Solving linear systems numerically

Objective

Conditioning of linear systems

Recalls: vector and matrix norms (1/3)

Recalls: vector and matrix norms (2/3)

Recalls: vector and matrix norms (3/3)

Conditioning of linear systems

Direct methods

LU decomposition by Gaussian elimination

LU decomposition: computational cost

Direct method for symmetric positive definite \(\mathsf A\)

Iterative methods

A first iterative method

Spectral radius and Gelfand’s lemma

Analysis of Richardson’s method

Analysis of Richardson’s method

Illustration of Richardson’s method

Illustration of Richardson’s method: \(\omega = .2\)

Illustration of Richardson’s method: \(\omega = .6\)

Illustration of Richardson’s method: optimal \(\omega = .5\)

More general basic iterative methods

Convergence of general basic iterative method

Concrete convergence guarantees

Improving upon Richardson’s iteration

Illustration of steepest descent

Further improvement: the conjugate directions method

Conjugate directions: example

Conjugate directions: illustration

Conjugate gradient method

Overview of other Krylov subspace methods