Solving linear systems numerically

Cours ENPC — Pratique du calcul scientifique

Objective

Goal of this chapter

Study numerical methods for the linear equation \[ \mathsf A \mathbf{\boldsymbol{x}} = \mathbf{\boldsymbol{b}}, \] where \(\mathsf A \in \mathbf C^{n \times n}\) and \(b \in \mathbf C^n.\)

Two classes of methods:

  • Direct methods:

    • LU decomposition (for general invertible matrices);

    • Cholesky decomposition (for symmetric positive definite matrices)

  • Iterative methods:

    • Basic iterative methods based on a splitting;
    • Conjugate gradients

    • And many more: GMRES, BiCGStab, etc.

Before discussing these methods, we introduce the concept of conditioning.

Conditioning of linear systems

  • Assume that we wish to calculate numerically the solution \((1, -1)^T\) of \[ \mathsf A \mathbf{\boldsymbol{x}} := \begin{pmatrix} 1 & 1 \\ 1 & 1 - 10^{-12} \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 0 \\ 10^{-12} \end{pmatrix} =: \mathbf{\boldsymbol{b}}. \]
  • In Julia, we can calculate the solution with the \ operator

    A = [1 1; 1 (1-1e-12)]
    b = [0; 1e-12]
    x = A\b
    2-element Vector{Float64}:
      1.0000221222095027
     -1.0000221222095027

    Why is the relative error much larger than the machine epsilon?

  • Julia solves not \(\mathsf A \mathbf{\boldsymbol{x}} = \mathbf{\boldsymbol{b}}\) but \[(\mathsf A + \Delta \mathsf A) (\mathbf{\boldsymbol{x}} + \Delta \mathbf{\boldsymbol{x}}) = (\mathbf{\boldsymbol{b}} + \Delta \mathbf{\boldsymbol{b}})\] where \(\Delta \mathsf A\) and \(\Delta \mathbf{\boldsymbol{b}}\) are roundoff errors, and \(\Delta \mathbf{\boldsymbol{x}}\) is the resulting perturbation of the solution.

Question: Can we estimate \(\frac{{\lVert {\Delta \mathbf{\boldsymbol{x}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}} \rVert}}\) in terms of \(\frac{{\lVert {\Delta \mathsf A} \rVert}}{{\lVert {\mathsf A} \rVert}}\) and \(\frac{{\lVert {\Delta \mathbf{\boldsymbol{b}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{b}}} \rVert}}\)?

Recalls: vector and matrix norms (1/3)

Given \(p \in [1, \infty]\), the \(p\)-norm of a vector \(\mathbf{\boldsymbol{x}} \in \mathbf C^n\) is defined as follows: \[ {\lVert {\mathbf{\boldsymbol{x}}} \rVert}_p := \begin{cases} \left( \sum_{i=1}^{n} {\lvert {x_i} \rvert}^p \right)^{\frac{1}{p}} & \text{if $p < \infty$}, \\ \max \Bigl\{ {\lvert {x_1} \rvert}, \dotsc, {\lvert {x_n} \rvert} \Bigr\} & \text{if $p = \infty$}. \end{cases} \]

Code
Plots.plot(aspect_ratio=:equal, xlims=(-1.1,1.1), ylims=(-1.1, 1.1))
Plots.plot!(framestyle=:origin, grid=true, legend=:outertopright)
Plots.plot!(title="Unit circle in different norms")
Plots.plot!([0, 1, 0, -1, 0], [-1, 0, 1, 0, -1], label=L"$p = 1$ (taxicab norm)")
Plots.plot!([1, 1, -1, -1, 1], [1, -1, -1, 1, 1], label=L"p = \infty")
Plots.plot!(t -> cos(t), t -> sin(t), 0, 2π, label=L"$p = 2$ (Euclidean norm)")

Recalls: vector and matrix norms (2/3)

The operator norm on matrices induced by the vector \(p\)-norm is given by \[ {\lVert {\mathsf A} \rVert}_{p} := \sup_{{\lVert {\mathbf{\boldsymbol{x}}} \rVert}_{p} \leqslant 1} {\lVert {\mathsf A \mathbf{\boldsymbol{x}}} \rVert}_{p} \] It follows from the definition that \({\lVert {\mathsf A \mathsf B} \rVert}_p \leqslant{\lVert {\mathsf A} \rVert}_p {\lVert {\mathsf B} \rVert}_p\).

Exercises

Show that

  • The matrix 2-norm is given by \(\sqrt{\lambda_{\rm max}(\mathsf A^* \mathsf A)}\).

  • The matrix 1-norm is the maximum absolute column sum:

    \[{\lVert {\mathsf A} \rVert}_1 = \max_{1 \leqslant j \leqslant n} \sum_{i=1}^{n} {\lvert {a_{ij}} \rvert}.\]

  • The matrix \(\infty\)-norm is the maximum absolute row sum:

    \[{\lVert {\mathsf A} \rVert}_{\infty} = \max_{1 \leqslant i \leqslant n} \sum_{j=1}^{n} {\lvert {a_{ij}} \rvert}.\]

Recalls: vector and matrix norms (3/3)

From matrix norms, we define the condition number of a matrix as \[ \kappa_p(\mathsf A) = {\lVert {\mathsf A} \rVert}_p {\lVert {\mathsf A^{-1}} \rVert}_p \]

Properties:

  • \(\kappa_p(\mathsf I) = 1\)

  • \(\kappa_p(\mathsf A) \geqslant 1\)

  • \(\kappa_p(\alpha \mathsf A) = \kappa_p(\mathsf A)\)

In Julia, calculate the condition number using cond(A, p=2).

Conditioning of linear systems

Proposition

Let \(\mathbf{\boldsymbol{x}} + \Delta \mathbf{\boldsymbol{x}}\) denote the solution to \[ \mathsf A (\mathbf{\boldsymbol{x}} + \Delta \mathbf{\boldsymbol{x}}) = \mathbf{\boldsymbol{b}} + \Delta \mathbf{\boldsymbol{b}} \] The following inequality holds: \[ \frac{{\lVert {\Delta \mathbf{\boldsymbol{x}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}} \rVert}} \leqslant\kappa(\mathsf A) \frac{{\lVert {\Delta \mathbf{\boldsymbol{b}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{b}}} \rVert}} \]

Proof

It holds by definition of \(\Delta \mathbf{\boldsymbol{x}}\) that \(\mathsf A \Delta \mathbf{\boldsymbol{x}} = \Delta \mathbf{\boldsymbol{b}}\). Therefore \[ \begin{aligned} {\lVert {\Delta \mathbf{\boldsymbol{x}}} \rVert} &= {\lVert {\mathsf A^{-1} \Delta \mathbf{\boldsymbol{b}}} \rVert} \leqslant{\lVert {\mathsf A^{-1}} \rVert} {\lVert {\Delta \mathbf{\boldsymbol{b}}} \rVert} \\ &= \frac{{\lVert {\mathsf A \mathbf{\boldsymbol{x}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{b}}} \rVert}} {\lVert {\mathsf A^{-1}} \rVert} {\lVert {\Delta \mathbf{\boldsymbol{b}}} \rVert} \leqslant\frac{{\lVert {\mathsf A} \rVert} {\lVert {\mathbf{\boldsymbol{x}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{b}}} \rVert}} {\lVert {\mathsf A^{-1}} \rVert} {\lVert {\Delta \mathbf{\boldsymbol{b}}} \rVert}. \end{aligned} \] Rearranging, we obtain the statement.

Proposition

Let \(\mathbf{\boldsymbol{x}} + \Delta \mathbf{\boldsymbol{x}}\) denote the solution to \[ (\mathsf A + \Delta \mathsf A) (\mathbf{\boldsymbol{x}} + \Delta \mathbf{\boldsymbol{x}}) = \mathbf{\boldsymbol{b}} \] If \(\mathsf A\) is invertible and \({\lVert {\Delta \mathsf A} \rVert} < \frac{1}{2} {\lVert {\mathsf A^{-1}} \rVert}^{-1}\), then \[ \frac{{\lVert {\Delta \mathbf{\boldsymbol{x}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}} \rVert}} \leqslant 2\kappa(\mathsf A) \frac{{\lVert {\Delta \mathsf A} \rVert}}{{\lVert {\mathsf A} \rVert}} \]

Conclusion: \(\kappa(\mathsf A)\) measures sensitivity to perturbations:

  • useful to estimate the impact of round-off errors;

  • influences convergence speed of methods (see later).

  • When \(\kappa_p(\mathsf A) \gg 1\), the system is called ill-conditioned.

Direct methods

  • First calculate the \(\mathsf L \mathsf U\) decomposition of \(\mathsf A\) with

    • \(\mathsf U\) upper triangular matrix;

    • \(\mathsf L\) unit lower triangular.

  • Then solve \(\mathsf L \mathbf{\boldsymbol{y}} = \mathbf{\boldsymbol{b}}\) using forward substitution.

    y = copy(b)
    for i in 2:n
        for j in 1:i-1
            y[i] -= L[i, j] * y[j]
        end
    end
  • Finally, solve \(\mathsf U \mathbf{\boldsymbol{x}} = \mathbf{\boldsymbol{y}}\) using backward substitution.

Remark: The LU decomposition may not exist

In practice, use decomposition \(\mathsf P \mathsf A = \mathsf L \mathsf U\), with \(\mathsf P\) a permutation matrix.

  • Guaranteed to exist if \(\mathsf A\) is invertible;

  • More numerically stable.

LU decomposition by Gaussian elimination

\[ \underbrace{ \begin{pmatrix} 2 & 1 & -1 \\ -3 & -1 & 2 \\ -2 & 1 & 2 \end{pmatrix} }_{\mathsf A} \]

\[ \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ {\color{red}\frac{3}{2}} & 1 & 0 \\ {\color{red}{1}} & 0 & 1 \end{pmatrix} }_{=: \, \mathsf M_1}~ \underbrace{ \begin{pmatrix} 2 & 1 & -1 \\ -3 & -1 & 2 \\ -2 & 1 & 2 \end{pmatrix} }_{\mathsf A} = \begin{pmatrix} 2 & 1 & -1 \\ {\color{red} 0} & \frac{1}{2} & \frac{1}{2} \\ {\color{red} 0} & 2 & 1 \end{pmatrix} \]

\[ \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & {\color{blue} -4} & 1 \end{pmatrix} }_{=: \, \mathsf M_2}~ \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ {\color{red}\frac{3}{2}} & 1 & 0 \\ {\color{red}{1}} & 0 & 1 \end{pmatrix} }_{=: \, \mathsf M_1}~ \underbrace{ \begin{pmatrix} 2 & 1 & -1 \\ -3 & -1 & 2 \\ -2 & 1 & 2 \end{pmatrix} }_{\mathsf A} = \underbrace{ \begin{pmatrix} 2 & 1 & -1 \\ {\color{red} 0} & \frac{1}{2} & \frac{1}{2} \\ {\color{red} 0} & {\color{blue} 0} & -1 \end{pmatrix} }_{=: \, \mathsf U} \]

Gaussian transformations \(\mathsf M_1\), \(\mathsf M_2\) are simple to invert and multiply. In particular \[ \mathsf A = \mathsf M_1^{-1} \mathsf M_2^{-1} \mathsf U = \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ {\color{red}-\frac{3}{2}} & 1 & 0 \\ {\color{red}{-1}} & 0 & 1 \end{pmatrix} }_{=: \, \mathsf M_1^{-1}}~ \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & {\color{blue} 4} & 1 \end{pmatrix} }_{=: \, \mathsf M_2^{-1}}~ \mathsf U = \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ {\color{red}-\frac{3}{2}} & 1 & 0 \\ {\color{red}{-1}} & {\color{blue} 4} & 1 \end{pmatrix} }_{\mathsf M_1^{-1} \mathsf M_2^{-1} =: \, \mathsf L}~ \mathsf U \]

LU decomposition: computational cost

# A is an invertible matrix of size n x n
L = [r == c ? 1.0 : 0.0 for r in 1:n, c in 1:n]
U = copy(A)
for c in 1:n-1, r in c+1:n
   U[c, c] == 0 && error("Pivotal entry is zero!")
   L[r, c] = U[r, c] / U[c, c]
   U[r, c:end] -= U[c, c:end] * L[r, c]
end
# L is unit lower triangular and U is upper triangular

\(~\)

  • Computational cost: \(\frac{2}{3} n^3 + \mathcal O(n^2)\) floating point operations (flops);

  • In comparison, the computational cost of forward/backward substitution scales as \(\mathcal O(n^2)\);

  • The LU decomposition can be reused for different right-hand sides;

  • If \(\mathsf A\) is a banded matrix, so are \(\mathsf L\) and \(\mathsf U\).

  • \(\mathsf L\) and \(\mathsf U\) are not necessarily sparse when \(\mathsf A\) is sparse.

Direct method for symmetric positive definite \(\mathsf A\)

In this case, it is more efficient to calculate the Cholesky factorization:

\[ \mathsf A = \mathsf C \mathsf C^* \]

with \(\mathsf C\) lower triangular with positive diagonal elements.

\(~\)

Exercise:

  • Write an algorithm for calculating \(\mathsf C\) by identification of \(\mathsf A\) and \(\mathsf C \mathsf C^*\).

  • What is the computational complexity of your implementation?

  • Bonus points if your implementation is efficient on banded matrices.

Iterative methods

Motivation: General-purpose direct methods are

  • exact up to roundoff errors

  • but computationally expensive: \(\mathcal O(n^3)\) flops for full matrices…

  • Additionally, they require the storage of \(\mathsf L\) and \(\mathsf U\).

\(~\)

In contrast, iterative methods

  • are usually approximate (but there are exceptions);

  • usually have a cost per iteration scaling as \(\mathcal O(n^2)\) at most;

    \(\rightarrow\) often computationally more economical

  • can be stopped at any point when the residual \(\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}\) is sufficiently small;

\(~\)

In this section, we focus on basic iterative methods based on a splitting

\[ \mathsf A = \mathsf M - \mathsf N \]

A first iterative method

Richardson’s method

\[\begin{align*} \mathbf{\boldsymbol{x}}^{(k + 1)} = \mathbf{\boldsymbol{x}}^{(k)} + \omega (\mathbf{\boldsymbol{b}} - \mathsf A \mathbf{\boldsymbol{x}}^{(k)}) \end{align*}\]

Proposition

Assume \(\omega \neq 0\). If \((\mathbf{\boldsymbol{x}}^{(k)})\) converges in the iteration above, then it converges towards the solution of \[\begin{align*} \mathsf A \mathbf{\boldsymbol{x}} = \mathbf{\boldsymbol{b}}. \end{align*}\]

Questions:

  • Does this method converge and how quickly?

  • What is a good stopping criterion?

Spectral radius and Gelfand’s lemma

Definition

The spectral radius of \(\mathsf A \in \mathbf C^{n \times n}\) is given by \[ \rho(\mathsf A) := \max_{\lambda \in \mathop{\mathrm{spectrum}}A} {\lvert {\lambda} \rvert} \]

Remarks:

  • \(\rho(A)\) is not a norm;

  • \(\rho(A) \leqslant{\lVert {\mathsf A} \rVert}\) for any induced matrix norm.

Theorem: Gelfand’s formula

For any \(\mathsf A \in \mathbf C^{n \times n}\) and any matrix norm \({\lVert {\cdot} \rVert}\), \[ \lim_{k \to \infty} {\lVert {\mathsf A^k} \rVert}^{1/k} = \rho(\mathsf A). \]

  • In particular \({\lVert {\mathsf A^k} \rVert} \to 0\) as \(k \to \infty\) iff \(\rho(\mathsf A) < 1\).

  • The smaller \(\rho(\mathsf A)\), the faster the convergence.

Analysis of Richardson’s method

\[\begin{align*} \mathbf{\boldsymbol{x}}^{(k + 1)} = \mathbf{\boldsymbol{x}}^{(k)} + \omega (\mathbf{\boldsymbol{b}} - \mathsf A \mathbf{\boldsymbol{x}}^{(k)}) \end{align*}\]

Proposition

The Richardson iteration converges to the real solution for every choice of \(\mathbf{\boldsymbol{x}}^{(0)}\) if and only if \[\begin{align*} \rho(\mathsf I - \omega \mathsf A) = \max_{\lambda \in \mathop{\mathrm{spectrum}}A} {\lvert {1 - \omega \lambda} \rvert} < 1. \end{align*}\]

Proof

We notice that \[ \mathbf{\boldsymbol{x}}^{(k+1)} - \mathbf{\boldsymbol{x}}_* = \mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_* + \omega (\mathsf A \mathbf{\boldsymbol{x}}_* - \mathsf A \mathbf{\boldsymbol{x}}^{(k)}) = (\mathsf I - \omega \mathsf A) \left( \mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_* \right). \] Therefore \[ \mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_* = (\mathsf I - \omega \mathsf A)^k \left( \mathbf{\boldsymbol{x}}^{(0)} - \mathbf{\boldsymbol{x}}_* \right) \] The “only if” part of the statement is clear, and the “if” part follows from the equivalence \[ {\lVert {\mathsf B^k} \rVert} \xrightarrow[k \to \infty]{} 0 \quad \Leftrightarrow \quad \rho(B) < 1. \]

Corollary: It is necessary for convergence that all the eigenvalues have nonzero real parts of the same sign.

Analysis of Richardson’s method

By Gelfand’s formula, for all \(\varepsilon > 0\) there is \(K \in \mathbf N\) such that \[ \forall k \geqslant K, \qquad {\lVert {\mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_*} \rVert} \leqslant\bigl( \rho(\mathsf I - \omega \mathsf A) + \varepsilon \bigr)^k {\lVert {\mathbf{\boldsymbol{x}}^{(0)} - \mathbf{\boldsymbol{x}}_*} \rVert}. \]

We consider the particular case where \(\mathsf A \in \mathbf R^{n \times n}\) is symmetric positive definite.

Remark: link to optimization

When \(\mathsf A\) is Hermitian positive definite, the solution to \(\mathsf A \mathbf{\boldsymbol{x}} = \mathbf{\boldsymbol{b}}\) is the minimizer \[ f(\mathbf{\boldsymbol{x}}) = \frac{1}{2} \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}^T\mathbf{\boldsymbol{x}}. \] In this case, Richardson’s iteration may be rewritten as \[ \mathbf{\boldsymbol{x}}^{(k+1)} = \mathbf{\boldsymbol{x}}^{(k)} - \omega \nabla f(\mathbf{\boldsymbol{x}}^{(k)}). \]

\(~\)

Exercise: find \(\omega\) that minimizes \(\rho(\mathsf I - \omega \mathsf A)\)

Simple calculations lead to \[ \omega_* = \frac{2}{\lambda_{\min}(\mathsf A) + \lambda_{\max}(\mathsf A)}, \qquad \rho(\mathsf I - \omega_* \mathsf A) = \frac{\kappa_2(\mathsf A) - 1}{\kappa_2(\mathsf A) + 1} \]

Illustration of Richardson’s method

Consider the linear system \[ \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 7 \\ 8 \end{pmatrix} \]

  • The exact solution is \((2, 3)^T\);

  • The eigenvalues of \(\mathsf A\) are 1 and 3;

  • The condition number \(\kappa_2(\mathsf A)\) is 3;

  • The optimal \(\omega\) is \(\frac{1}{2}\).

Contour plot of \[f(\mathbf{\boldsymbol{x}}) = \frac{1}{2} \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}^T\mathbf{\boldsymbol{x}}.\]

Illustration of Richardson’s method: \(\omega = .2\)

Illustration of Richardson’s method: \(\omega = .6\)

Illustration of Richardson’s method: optimal \(\omega = .5\)

More general basic iterative methods

Consider the splitting \[ \mathsf A = {\color{green}\mathsf M} - {\color{blue} \mathsf N} \] where \(\mathsf M\) is invertible and easy to invert. Then \[ \mathsf A \mathbf{\boldsymbol{x}}_* = \mathbf{\boldsymbol{b}} \quad \Leftrightarrow \quad {\color{green}\mathsf M} \mathbf{\boldsymbol{x}}_* = {\color{blue} \mathsf N} \mathbf{\boldsymbol{x}}_* + \mathbf{\boldsymbol{b}} \] which suggests the iterative method \[ {\color{green}\mathsf M} \mathbf{\boldsymbol{x}}^{(k+1)} = {\color{blue} \mathsf N} \mathbf{\boldsymbol{x}}^{(k)} + \mathbf{\boldsymbol{b}} \]

Standard basic iterative methods (here \(\mathsf A = \mathsf L + \mathsf D + \mathsf U\))

  • Richardson’s iteration corresponds to the particular splitting

\[ \mathsf A = \underbrace{{\color{green}\frac{1}{\omega} \mathsf I }}_{\mathsf M} - \underbrace{\left({\color{blue} \frac{1}{\omega} \mathsf I - \mathsf A}\right)}_{\mathsf N} \]

  • Jacobi’s iteration: \(\mathsf A = {\color{green}\mathsf D} - ({\color{blue}-\mathsf L - \mathsf U})\).

  • Gauss Seidel’s iteration: \(\mathsf A = ({\color{green}\mathsf D + \mathsf L}) - ({\color{blue}- \mathsf U})\)

  • Relaxation method: \(\mathsf A = \left({\color{green}\frac{\mathsf D}{\omega} + \mathsf L }\right) - \left({\color{blue}\frac{1 - \omega}{\omega} \mathsf D - \mathsf U} \right)\). \(~\)(Coincides with Gauss-Seidel when \(\omega = 1\))

Convergence of general basic iterative method

Proposition: convergence of the splitting method

  • The iteration \[ {\color{green}\mathsf M} \mathbf{\boldsymbol{x}}^{(k+1)} = {\color{blue} \mathsf N} \mathbf{\boldsymbol{x}}^{(k)} + \mathbf{\boldsymbol{b}} \] converges for any initial \(\mathbf{\boldsymbol{x}}^{(0)}\) if and only if \(\rho({\color{green}\mathsf M}^{-1} {\color{blue} \mathsf N}) < 1\).

  • In addition, for any \(\varepsilon > 0\) there exists \(K > 0\) such that \[ \forall k \geqslant K, \qquad {\lVert {\mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_*} \rVert} \leqslant\bigl(\rho({\color{green}\mathsf M}^{-1} {\color{blue} \mathsf N}) + \varepsilon\bigr)^k {\lVert {\mathbf{\boldsymbol{x}}^{(0)} - \mathbf{\boldsymbol{x}}_*} \rVert}. \]

Proof

Subtracting the equations \[ \begin{aligned} {\color{green}\mathsf M} \mathbf{\boldsymbol{x}}^{(k+1)} &= {\color{blue} \mathsf N} \mathbf{\boldsymbol{x}}^{(k)} + \mathbf{\boldsymbol{b}} \\ {\color{green}\mathsf M} \mathbf{\boldsymbol{x}}_* &= {\color{blue} \mathsf N} \mathbf{\boldsymbol{x}}_* + \mathbf{\boldsymbol{b}} \end{aligned} \] and rearranging gives \[ \mathbf{\boldsymbol{x}}^{(k+1)} - \mathbf{\boldsymbol{x}}_* = {\color{green}\mathsf M}^{-1} {\color{blue} \mathsf N} (\mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_*) = \dots = ({\color{green}\mathsf M}^{-1} {\color{blue} \mathsf N})^{k+1} (\mathbf{\boldsymbol{x}}^{(0)} - \mathbf{\boldsymbol{x}}_*). \] The “only if” part of the first item is simple. The other claims follow from Gelfand’s formula: \[ \forall \mathsf B \in \mathbf C^{n \times n}, \qquad \lim_{k \to \infty} {\lVert {\mathsf B^k} \rVert}^{1/k} = \rho(\mathsf B). \]

Concrete convergence guarantees

Settings where \(\rho(\mathsf M^{-1} \mathsf N) < 1\) are identified on a case by case basis:

  • Richardson’s iteration, sufficient condition: \(\mathsf A\) symmetric positive definite;

  • Jacobi’s iteration, sufficient condition: \(\mathsf A\) strictly row or column diagonally dominant:

\[ \lvert a_{ii} \rvert > \sum_{j \neq i} \lvert a_{ij} \rvert \quad \forall i \qquad \text{ or } \qquad \lvert a_{jj} \rvert > \sum_{i \neq j} \lvert a_{ij} \rvert \quad \forall j. \]

  • Gauss Seidel’s iteration, sufficient condition: \(\mathsf A\) strictly row or column diagonally dominant;

  • Relaxation method, sufficient condition: \(\mathsf A\) symmetric positive definite and \(\omega \in (0, 2)\);

  • Relaxation method, necessary condition: \(\omega \in (0, 2)\);

Improving upon Richardson’s iteration

Suppose again that \(\mathsf A\) is symmetric positive definite and recall Richardson’s iteration:

\[\begin{align*} \mathbf{\boldsymbol{x}}^{(k + 1)} = \mathbf{\boldsymbol{x}}^{(k)} - \omega \nabla f(\mathbf{\boldsymbol{x}}^{(k)}), \qquad f(\mathbf{\boldsymbol{x}}) := \frac{1}{2} \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}^T\mathbf{\boldsymbol{x}}. \end{align*}\]

Proposition

For any \(\mathbf{\boldsymbol{d}} \in \mathbf R^n\), it holds that \[ \mathop{\mathrm{arg\,min}}_{\omega \in \mathbf R} f \left(\mathbf{\boldsymbol{x}} - \omega \mathbf{\boldsymbol{d}} \right) = \frac{\mathbf{\boldsymbol{d}}^T\mathbf{\boldsymbol{r}}}{\mathbf{\boldsymbol{d}}^T\mathsf A \mathbf{\boldsymbol{d}}}, \qquad \mathbf{\boldsymbol{r}} = \mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}. \]

We can improve upon Richardson’s iteration by letting \[ \mathbf{\boldsymbol{x}}^{(k + 1)} = \mathbf{\boldsymbol{x}}^{(k)} - \omega_{\color{red}k} \nabla f(\mathbf{\boldsymbol{x}}^{(k)}), \qquad \omega_{\color{red}k} := \frac{\mathbf{\boldsymbol{d}}_k^T\mathbf{\boldsymbol{d}}_k}{\mathbf{\boldsymbol{d}}_k^T\mathsf A \mathbf{\boldsymbol{d}}_k}, \qquad \mathbf{\boldsymbol{d}}_k := \mathsf A \mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{b}}. \]

This is the steepest descent method with optimal step.

Illustration of steepest descent

Further improvement: the conjugate directions method

First note that \[ f(\mathbf{\boldsymbol{x}}) = \frac{1}{2} \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}^T\mathbf{\boldsymbol{x}} = \frac{1}{2} {\lVert {\mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{x}}_*} \rVert}_{\mathsf A}^2 + \text{constant}, \qquad {\lVert {\mathbf{\boldsymbol{y}}} \rVert}_{\mathsf A} := \sqrt{{\langle {\mathbf{\boldsymbol{y}}, \mathbf{\boldsymbol{y}}} \rangle}_{\mathsf A}}, \qquad {\langle {\mathbf{\boldsymbol{x}}, \mathbf{\boldsymbol{y}}} \rangle}_{\mathsf A} := \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{y}}. \] Given \(n\) conjugate directions \(\{\mathbf{\boldsymbol{d}}_0, \dotsc, \mathbf{\boldsymbol{d}}_{n-1}\}\) such that \({\langle {\mathbf{\boldsymbol{d}}_i, \mathbf{\boldsymbol{d}}_j} \rangle}_{\mathsf A} = \delta_{ij}\), we have \[ \mathbf{\boldsymbol{x}}_* - \mathbf{\boldsymbol{x}}^{(0)} = \sum_{i=0}^{n-1} \mathbf{\boldsymbol{d}}_i \mathbf{\boldsymbol{d}}_i^T\mathsf A (\mathbf{\boldsymbol{x}}_* - \mathbf{\boldsymbol{x}}^{(0)}) = \sum_{i=0}^{n-1} \mathbf{\boldsymbol{d}}_i \mathbf{\boldsymbol{d}}_i^T({\color{green}\mathbf{\boldsymbol{b}}} - \mathsf A \mathbf{\boldsymbol{x}}^{(0)}) \]

\(\color{green}\rightarrow\) the \(\mathsf A\)-projections of \(\mathbf{\boldsymbol{x}}_* - \mathbf{\boldsymbol{x}}^{(0)}\) can be calculated even though \(\mathbf{\boldsymbol{x}}_*\) is unknown!

# Assume `ds` is a list of n conjugate directions
x = zeros(n)
for k in 1:n
    d = ds[k]
    r = A*x - b
    ω = d'r / (d'A*d)  # Denominator = 1 if `d` is normalized
    x -= ω * d
end

Minimization property: by construction,

  • \(\mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}^{(0)}\) is the \(\mathsf A\)-orthogonal projection of \(\mathbf{\boldsymbol{x}}_* - \mathbf{\boldsymbol{x}}^{(0)}\) onto \(\mathop{\mathrm{Span}}\{\mathbf{\boldsymbol{d}}_0, \dotsc, \mathbf{\boldsymbol{d}}_{k-1}\}\).

  • Therefore, \(\mathbf{\boldsymbol{x}}^{(k)}\) minimizes \(f(\mathbf{\boldsymbol{x}}^{(k)})\) over the full affine subspace \(\mathbf{\boldsymbol{x}}^{(0)} + \mathop{\mathrm{Span}}\{\mathbf{\boldsymbol{d}}_0, \dotsc, \mathbf{\boldsymbol{d}}_{k-1}\}\).

Conjugate directions: example

  • Consider again the linear system \[ \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 7 \\ 8 \end{pmatrix} \]

  • (non-normalized) Conjugate directions calculated by Gram-Schmidt: \[ \mathbf{\boldsymbol{d}}_0 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \qquad \mathbf{\boldsymbol{d}}_1 = \begin{pmatrix} -1 \\ 2 \end{pmatrix}. \]

using LinearAlgebra
n = 2
A = [2.0 1.0; 1.0 2.0]
b = [7.0; 8.0]
x = zeros(n)
ds = [[1.0; 0.0], [-1.0; 2.0]]
for i in 1:n
    d = ds[i]
    r = A*x - b
    ω = d'r / (d'A*d)  # Denominator useful if `d` is not normalized
    x -= ω*d
    println("Residual: $(norm(A*x - b))")
end
Residual: 4.5
Residual: 0.0

Conjugate directions: illustration

  • In general, convergence in at most \(n\) iterations;
  • Question: how to generate conjugate directions ?

    \({\color{green} \rightarrow}\) the conjugate gradient method enables to generate conjugate directions on the fly.

Conjugate gradient method

  • Idea: generate conjugate direction \(\mathbf{\boldsymbol{d}}_k\) by applying Gram-Schmidt to \(\mathbf{\boldsymbol{r}}^{(0)}, \mathbf{\boldsymbol{r}}^{(1)}, \dotsc, \mathbf{\boldsymbol{r}}^{(k)}\).
  • Cost of calculating a new direction \(\mathbf{\boldsymbol{d}}_k\) seemingly scales as \(\mathcal O(k)\), but in fact \(\color{green} \mathcal O(1)\).
  • Convergence: exact in \(n\) iterations, and otherwise faster than Richardson’s iteration:

Theorem

Suppose that \(\mathsf A\) is symmetric positive definite. Then \[ \forall k \geqslant 0, \qquad {\lVert {\mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_*} \rVert}_{\mathsf A} \leqslant 2 \left( \frac{{\color{red}\sqrt{\kappa_2(\mathsf A)}} - 1}{{\color{red}\sqrt{\kappa_2(\mathsf A)}} + 1} \right)^{k} {\lVert {\mathbf{\boldsymbol{x}}^{(0)} - \mathbf{\boldsymbol{x}}_*} \rVert}_{\mathsf A}, \]

Overview of other Krylov subspace methods

Many iterative methods used at present are based on Krylov subspaces: \[ \mathcal K_k(\mathsf A, \mathbf{\boldsymbol{b}}) = \mathop{\mathrm{Span}}\Bigl\{ \mathbf{\boldsymbol{b}}, \mathsf A \mathbf{\boldsymbol{b}}, \mathsf A^2 \mathbf{\boldsymbol{b}}, \dotsc, \mathsf A^{k-1} \mathbf{\boldsymbol{b}} \Bigr\}. \]

  • Conjugate gradient method (only for symmetric positive definite \(\mathsf A\)): \[x^{(k)} = \mathop{\mathrm{arg\,min}}_{\mathbf{\boldsymbol{x}} \in \mathcal K_{k}} {\lVert {\mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{x}}_*} \rVert}_{\mathsf A}, \qquad {\lVert {\mathbf{\boldsymbol{y}}} \rVert}_{\mathsf A} := \sqrt{{\langle {\mathbf{\boldsymbol{y}}, \mathbf{\boldsymbol{y}}} \rangle}_{\mathsf A}}, \qquad {\langle {\mathbf{\boldsymbol{x}}, \mathbf{\boldsymbol{y}}} \rangle}_{\mathsf A} := \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{y}}.\] Key idea: Although \(\mathbf{\boldsymbol{x}}_*\) is unknown, its orthogonal projections for the inner product \({\langle {\cdot, \cdot} \rangle}_{\mathsf A}\) can be calculated: \[{\langle {\mathbf{\boldsymbol{d}}, \mathbf{\boldsymbol{x}}_*} \rangle}_{\mathsf A} = {\langle {\mathbf{\boldsymbol{d}}, \mathsf A \mathbf{\boldsymbol{x}}_*} \rangle} = \mathbf{\boldsymbol{d}}^T\mathbf{\boldsymbol{b}}.\]
  • Full orthogonalization method (FOM): find \(\mathbf{\boldsymbol{x}}^{(k)}\) in \(\mathcal K_k\) such that the following Galerkin condition is satisfied:

    \[{\langle {\mathsf A \mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{b}}, \mathcal K_k} \rangle} = 0.\]

    Iterates coincide with conjugate gradient method when \(\mathsf A\) is symmetric positive definite.

  • Generalized Minimal Residual Method (GMRES) \[x^{(k)} = \mathop{\mathrm{arg\,min}}_{\mathbf{\boldsymbol{x}} \in \mathcal K_{k}} {\lVert {\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}} \rVert}.\]