= [1 1; 1 (1-1e-12)]
A = [0; 1e-12]
b = A\b x
2-element Vector{Float64}:
1.0000221222095027
-1.0000221222095027
Cours ENPC — Pratique du calcul scientifique
$$
% % %
%
$$
\[ \definecolor{gris_C}{RGB}{96,96,96} \definecolor{blanc_C}{RGB}{255,255,255} \definecolor{pistache_C}{RGB}{176,204,78} \definecolor{pistache2_C}{RGB}{150,171,91} \definecolor{jaune_C}{RGB}{253,235,125} \definecolor{jaune2_C}{RGB}{247,208,92} \definecolor{orange_C}{RGB}{244,157,84} \definecolor{orange2_C}{RGB}{239,119,87} \definecolor{bleu_C}{RGB}{126,151,206} \definecolor{bleu2_C}{RGB}{90,113,180} \definecolor{vert_C}{RGB}{96,180,103} \definecolor{vert2_C}{RGB}{68,141,96} \definecolor{pistache_light_C}{RGB}{235,241,212} \definecolor{jaune_light_C}{RGB}{254,250,224} \definecolor{orange_light_C}{RGB}{251,230,214} \definecolor{bleu_light_C}{RGB}{222,229,241} \definecolor{vert_light_C}{RGB}{216,236,218} \]
Goal of this chapter
Study numerical methods for the linear equation \[ \mathsf A \mathbf{\boldsymbol{x}} = \mathbf{\boldsymbol{b}}, \] where \(\mathsf A \in \mathbf C^{n \times n}\) and \(b \in \mathbf C^n.\)
Two classes of methods:
Direct methods:
LU decomposition (for general invertible matrices);
Cholesky decomposition (for symmetric positive definite matrices)
Iterative methods:
Conjugate gradients
And many more: GMRES, BiCGStab, etc.
Before discussing these methods, we introduce the concept of conditioning.
In Julia, we can calculate the solution with the \
operator
2-element Vector{Float64}:
1.0000221222095027
-1.0000221222095027
Why is the relative error much larger than the machine epsilon?
Question: Can we estimate \(\frac{{\lVert {\Delta \mathbf{\boldsymbol{x}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}} \rVert}}\) in terms of \(\frac{{\lVert {\Delta \mathsf A} \rVert}}{{\lVert {\mathsf A} \rVert}}\) and \(\frac{{\lVert {\Delta \mathbf{\boldsymbol{b}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{b}}} \rVert}}\)?
Given \(p \in [1, \infty]\), the \(p\)-norm of a vector \(\mathbf{\boldsymbol{x}} \in \mathbf C^n\) is defined as follows: \[ {\lVert {\mathbf{\boldsymbol{x}}} \rVert}_p := \begin{cases} \left( \sum_{i=1}^{n} {\lvert {x_i} \rvert}^p \right)^{\frac{1}{p}} & \text{if $p < \infty$}, \\ \max \Bigl\{ {\lvert {x_1} \rvert}, \dotsc, {\lvert {x_n} \rvert} \Bigr\} & \text{if $p = \infty$}. \end{cases} \]
Plots.plot(aspect_ratio=:equal, xlims=(-1.1,1.1), ylims=(-1.1, 1.1))
Plots.plot!(framestyle=:origin, grid=true, legend=:outertopright)
Plots.plot!(title="Unit circle in different norms")
Plots.plot!([0, 1, 0, -1, 0], [-1, 0, 1, 0, -1], label=L"$p = 1$ (taxicab norm)")
Plots.plot!([1, 1, -1, -1, 1], [1, -1, -1, 1, 1], label=L"p = \infty")
Plots.plot!(t -> cos(t), t -> sin(t), 0, 2π, label=L"$p = 2$ (Euclidean norm)")
The operator norm on matrices induced by the vector \(p\)-norm is given by \[ {\lVert {\mathsf A} \rVert}_{p} := \sup_{{\lVert {\mathbf{\boldsymbol{x}}} \rVert}_{p} \leqslant 1} {\lVert {\mathsf A \mathbf{\boldsymbol{x}}} \rVert}_{p} \] It follows from the definition that \({\lVert {\mathsf A \mathsf B} \rVert}_p \leqslant{\lVert {\mathsf A} \rVert}_p {\lVert {\mathsf B} \rVert}_p\).
Exercises
Show that
The matrix 2-norm is given by \(\sqrt{\lambda_{\rm max}(\mathsf A^* \mathsf A)}\).
The matrix 1-norm is the maximum absolute column sum:
\[{\lVert {\mathsf A} \rVert}_1 = \max_{1 \leqslant j \leqslant n} \sum_{i=1}^{n} {\lvert {a_{ij}} \rvert}.\]
The matrix \(\infty\)-norm is the maximum absolute row sum:
\[{\lVert {\mathsf A} \rVert}_{\infty} = \max_{1 \leqslant i \leqslant n} \sum_{j=1}^{n} {\lvert {a_{ij}} \rvert}.\]
From matrix norms, we define the condition number of a matrix as \[ \kappa_p(\mathsf A) = {\lVert {\mathsf A} \rVert}_p {\lVert {\mathsf A^{-1}} \rVert}_p \]
Properties:
\(\kappa_p(\mathsf I) = 1\)
\(\kappa_p(\mathsf A) \geqslant 1\)
\(\kappa_p(\alpha \mathsf A) = \kappa_p(\mathsf A)\)
In Julia, calculate the condition number using cond(A, p=2)
.
Proposition
Let \(\mathbf{\boldsymbol{x}} + \Delta \mathbf{\boldsymbol{x}}\) denote the solution to \[ \mathsf A (\mathbf{\boldsymbol{x}} + \Delta \mathbf{\boldsymbol{x}}) = \mathbf{\boldsymbol{b}} + \Delta \mathbf{\boldsymbol{b}} \] The following inequality holds: \[ \frac{{\lVert {\Delta \mathbf{\boldsymbol{x}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}} \rVert}} \leqslant\kappa(\mathsf A) \frac{{\lVert {\Delta \mathbf{\boldsymbol{b}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{b}}} \rVert}} \]
Proof
It holds by definition of \(\Delta \mathbf{\boldsymbol{x}}\) that \(\mathsf A \Delta \mathbf{\boldsymbol{x}} = \Delta \mathbf{\boldsymbol{b}}\). Therefore \[ \begin{aligned} {\lVert {\Delta \mathbf{\boldsymbol{x}}} \rVert} &= {\lVert {\mathsf A^{-1} \Delta \mathbf{\boldsymbol{b}}} \rVert} \leqslant{\lVert {\mathsf A^{-1}} \rVert} {\lVert {\Delta \mathbf{\boldsymbol{b}}} \rVert} \\ &= \frac{{\lVert {\mathsf A \mathbf{\boldsymbol{x}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{b}}} \rVert}} {\lVert {\mathsf A^{-1}} \rVert} {\lVert {\Delta \mathbf{\boldsymbol{b}}} \rVert} \leqslant\frac{{\lVert {\mathsf A} \rVert} {\lVert {\mathbf{\boldsymbol{x}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{b}}} \rVert}} {\lVert {\mathsf A^{-1}} \rVert} {\lVert {\Delta \mathbf{\boldsymbol{b}}} \rVert}. \end{aligned} \] Rearranging, we obtain the statement.
Proposition
Let \(\mathbf{\boldsymbol{x}} + \Delta \mathbf{\boldsymbol{x}}\) denote the solution to \[ (\mathsf A + \Delta \mathsf A) (\mathbf{\boldsymbol{x}} + \Delta \mathbf{\boldsymbol{x}}) = \mathbf{\boldsymbol{b}} \] If \(\mathsf A\) is invertible and \({\lVert {\Delta \mathsf A} \rVert} < \frac{1}{2} {\lVert {\mathsf A^{-1}} \rVert}^{-1}\), then \[ \frac{{\lVert {\Delta \mathbf{\boldsymbol{x}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}} \rVert}} \leqslant 2\kappa(\mathsf A) \frac{{\lVert {\Delta \mathsf A} \rVert}}{{\lVert {\mathsf A} \rVert}} \]
Conclusion: \(\kappa(\mathsf A)\) measures sensitivity to perturbations:
useful to estimate the impact of round-off errors;
influences convergence speed of methods (see later).
When \(\kappa_p(\mathsf A) \gg 1\), the system is called ill-conditioned.
First calculate the \(\mathsf L \mathsf U\) decomposition of \(\mathsf A\) with
\(\mathsf U\) upper triangular matrix;
\(\mathsf L\) unit lower triangular.
Then solve \(\mathsf L \mathbf{\boldsymbol{y}} = \mathbf{\boldsymbol{b}}\) using forward substitution.
Remark: The LU decomposition may not exist
In practice, use decomposition \(\mathsf P \mathsf A = \mathsf L \mathsf U\), with \(\mathsf P\) a permutation matrix.
Guaranteed to exist if \(\mathsf A\) is invertible;
More numerically stable.
\[ \underbrace{ \begin{pmatrix} 2 & 1 & -1 \\ -3 & -1 & 2 \\ -2 & 1 & 2 \end{pmatrix} }_{\mathsf A} \]
\[ \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ {\color{red}\frac{3}{2}} & 1 & 0 \\ {\color{red}{1}} & 0 & 1 \end{pmatrix} }_{=: \, \mathsf M_1}~ \underbrace{ \begin{pmatrix} 2 & 1 & -1 \\ -3 & -1 & 2 \\ -2 & 1 & 2 \end{pmatrix} }_{\mathsf A} = \begin{pmatrix} 2 & 1 & -1 \\ {\color{red} 0} & \frac{1}{2} & \frac{1}{2} \\ {\color{red} 0} & 2 & 1 \end{pmatrix} \]
\[ \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & {\color{blue} -4} & 1 \end{pmatrix} }_{=: \, \mathsf M_2}~ \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ {\color{red}\frac{3}{2}} & 1 & 0 \\ {\color{red}{1}} & 0 & 1 \end{pmatrix} }_{=: \, \mathsf M_1}~ \underbrace{ \begin{pmatrix} 2 & 1 & -1 \\ -3 & -1 & 2 \\ -2 & 1 & 2 \end{pmatrix} }_{\mathsf A} = \underbrace{ \begin{pmatrix} 2 & 1 & -1 \\ {\color{red} 0} & \frac{1}{2} & \frac{1}{2} \\ {\color{red} 0} & {\color{blue} 0} & -1 \end{pmatrix} }_{=: \, \mathsf U} \]
Gaussian transformations \(\mathsf M_1\), \(\mathsf M_2\) are simple to invert and multiply. In particular \[ \mathsf A = \mathsf M_1^{-1} \mathsf M_2^{-1} \mathsf U = \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ {\color{red}-\frac{3}{2}} & 1 & 0 \\ {\color{red}{-1}} & 0 & 1 \end{pmatrix} }_{=: \, \mathsf M_1^{-1}}~ \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & {\color{blue} 4} & 1 \end{pmatrix} }_{=: \, \mathsf M_2^{-1}}~ \mathsf U = \underbrace{ \begin{pmatrix} 1 & 0 & 0 \\ {\color{red}-\frac{3}{2}} & 1 & 0 \\ {\color{red}{-1}} & {\color{blue} 4} & 1 \end{pmatrix} }_{\mathsf M_1^{-1} \mathsf M_2^{-1} =: \, \mathsf L}~ \mathsf U \]
# A is an invertible matrix of size n x n
L = [r == c ? 1.0 : 0.0 for r in 1:n, c in 1:n]
U = copy(A)
for c in 1:n-1, r in c+1:n
U[c, c] == 0 && error("Pivotal entry is zero!")
L[r, c] = U[r, c] / U[c, c]
U[r, c:end] -= U[c, c:end] * L[r, c]
end
# L is unit lower triangular and U is upper triangular
\(~\)
Computational cost: \(\frac{2}{3} n^3 + \mathcal O(n^2)\) floating point operations (flops);
In comparison, the computational cost of forward/backward substitution scales as \(\mathcal O(n^2)\);
The LU decomposition can be reused for different right-hand sides;
If \(\mathsf A\) is a banded matrix, so are \(\mathsf L\) and \(\mathsf U\).
\(\mathsf L\) and \(\mathsf U\) are not necessarily sparse when \(\mathsf A\) is sparse.
In this case, it is more efficient to calculate the Cholesky factorization:
\[ \mathsf A = \mathsf C \mathsf C^* \]
with \(\mathsf C\) lower triangular with positive diagonal elements.
\(~\)
Exercise:
Write an algorithm for calculating \(\mathsf C\) by identification of \(\mathsf A\) and \(\mathsf C \mathsf C^*\).
What is the computational complexity of your implementation?
Bonus points if your implementation is efficient on banded matrices.
Motivation: General-purpose direct methods are
exact up to roundoff errors
but computationally expensive: \(\mathcal O(n^3)\) flops for full matrices…
Additionally, they require the storage of \(\mathsf L\) and \(\mathsf U\).
\(~\)
In contrast, iterative methods
are usually approximate (but there are exceptions);
usually have a cost per iteration scaling as \(\mathcal O(n^2)\) at most;
\(\rightarrow\) often computationally more economical
can be stopped at any point when the residual \(\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}\) is sufficiently small;
\(~\)
In this section, we focus on basic iterative methods based on a splitting
\[ \mathsf A = \mathsf M - \mathsf N \]
Richardson’s method
\[\begin{align*} \mathbf{\boldsymbol{x}}^{(k + 1)} = \mathbf{\boldsymbol{x}}^{(k)} + \omega (\mathbf{\boldsymbol{b}} - \mathsf A \mathbf{\boldsymbol{x}}^{(k)}) \end{align*}\]
Proposition
Assume \(\omega \neq 0\). If \((\mathbf{\boldsymbol{x}}^{(k)})\) converges in the iteration above, then it converges towards the solution of \[\begin{align*} \mathsf A \mathbf{\boldsymbol{x}} = \mathbf{\boldsymbol{b}}. \end{align*}\]
Questions:
Does this method converge and how quickly?
What is a good stopping criterion?
Definition
The spectral radius of \(\mathsf A \in \mathbf C^{n \times n}\) is given by \[ \rho(\mathsf A) := \max_{\lambda \in \mathop{\mathrm{spectrum}}A} {\lvert {\lambda} \rvert} \]
Remarks:
\(\rho(A)\) is not a norm;
\(\rho(A) \leqslant{\lVert {\mathsf A} \rVert}\) for any induced matrix norm.
Theorem: Gelfand’s formula
For any \(\mathsf A \in \mathbf C^{n \times n}\) and any matrix norm \({\lVert {\cdot} \rVert}\), \[ \lim_{k \to \infty} {\lVert {\mathsf A^k} \rVert}^{1/k} = \rho(\mathsf A). \]
In particular \({\lVert {\mathsf A^k} \rVert} \to 0\) as \(k \to \infty\) iff \(\rho(\mathsf A) < 1\).
The smaller \(\rho(\mathsf A)\), the faster the convergence.
\[\begin{align*} \mathbf{\boldsymbol{x}}^{(k + 1)} = \mathbf{\boldsymbol{x}}^{(k)} + \omega (\mathbf{\boldsymbol{b}} - \mathsf A \mathbf{\boldsymbol{x}}^{(k)}) \end{align*}\]
Proposition
The Richardson iteration converges to the real solution for every choice of \(\mathbf{\boldsymbol{x}}^{(0)}\) if and only if \[\begin{align*} \rho(\mathsf I - \omega \mathsf A) = \max_{\lambda \in \mathop{\mathrm{spectrum}}A} {\lvert {1 - \omega \lambda} \rvert} < 1. \end{align*}\]
Proof
We notice that \[ \mathbf{\boldsymbol{x}}^{(k+1)} - \mathbf{\boldsymbol{x}}_* = \mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_* + \omega (\mathsf A \mathbf{\boldsymbol{x}}_* - \mathsf A \mathbf{\boldsymbol{x}}^{(k)}) = (\mathsf I - \omega \mathsf A) \left( \mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_* \right). \] Therefore \[ \mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_* = (\mathsf I - \omega \mathsf A)^k \left( \mathbf{\boldsymbol{x}}^{(0)} - \mathbf{\boldsymbol{x}}_* \right) \] The “only if” part of the statement is clear, and the “if” part follows from the equivalence \[ {\lVert {\mathsf B^k} \rVert} \xrightarrow[k \to \infty]{} 0 \quad \Leftrightarrow \quad \rho(B) < 1. \]
Corollary: It is necessary for convergence that all the eigenvalues have nonzero real parts of the same sign.
By Gelfand’s formula, for all \(\varepsilon > 0\) there is \(K \in \mathbf N\) such that \[ \forall k \geqslant K, \qquad {\lVert {\mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_*} \rVert} \leqslant\bigl( \rho(\mathsf I - \omega \mathsf A) + \varepsilon \bigr)^k {\lVert {\mathbf{\boldsymbol{x}}^{(0)} - \mathbf{\boldsymbol{x}}_*} \rVert}. \]
We consider the particular case where \(\mathsf A \in \mathbf R^{n \times n}\) is symmetric positive definite.
Remark: link to optimization
When \(\mathsf A\) is Hermitian positive definite, the solution to \(\mathsf A \mathbf{\boldsymbol{x}} = \mathbf{\boldsymbol{b}}\) is the minimizer \[ f(\mathbf{\boldsymbol{x}}) = \frac{1}{2} \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}^T\mathbf{\boldsymbol{x}}. \] In this case, Richardson’s iteration may be rewritten as \[ \mathbf{\boldsymbol{x}}^{(k+1)} = \mathbf{\boldsymbol{x}}^{(k)} - \omega \nabla f(\mathbf{\boldsymbol{x}}^{(k)}). \]
\(~\)
Exercise: find \(\omega\) that minimizes \(\rho(\mathsf I - \omega \mathsf A)\)
Simple calculations lead to \[ \omega_* = \frac{2}{\lambda_{\min}(\mathsf A) + \lambda_{\max}(\mathsf A)}, \qquad \rho(\mathsf I - \omega_* \mathsf A) = \frac{\kappa_2(\mathsf A) - 1}{\kappa_2(\mathsf A) + 1} \]
Consider the linear system \[ \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 7 \\ 8 \end{pmatrix} \]
The exact solution is \((2, 3)^T\);
The eigenvalues of \(\mathsf A\) are 1 and 3;
The condition number \(\kappa_2(\mathsf A)\) is 3;
The optimal \(\omega\) is \(\frac{1}{2}\).
Contour plot of \[f(\mathbf{\boldsymbol{x}}) = \frac{1}{2} \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}^T\mathbf{\boldsymbol{x}}.\]
Consider the splitting \[ \mathsf A = {\color{green}\mathsf M} - {\color{blue} \mathsf N} \] where \(\mathsf M\) is invertible and easy to invert. Then \[ \mathsf A \mathbf{\boldsymbol{x}}_* = \mathbf{\boldsymbol{b}} \quad \Leftrightarrow \quad {\color{green}\mathsf M} \mathbf{\boldsymbol{x}}_* = {\color{blue} \mathsf N} \mathbf{\boldsymbol{x}}_* + \mathbf{\boldsymbol{b}} \] which suggests the iterative method \[ {\color{green}\mathsf M} \mathbf{\boldsymbol{x}}^{(k+1)} = {\color{blue} \mathsf N} \mathbf{\boldsymbol{x}}^{(k)} + \mathbf{\boldsymbol{b}} \]
Standard basic iterative methods (here \(\mathsf A = \mathsf L + \mathsf D + \mathsf U\))
\[ \mathsf A = \underbrace{{\color{green}\frac{1}{\omega} \mathsf I }}_{\mathsf M} - \underbrace{\left({\color{blue} \frac{1}{\omega} \mathsf I - \mathsf A}\right)}_{\mathsf N} \]
Jacobi’s iteration: \(\mathsf A = {\color{green}\mathsf D} - ({\color{blue}-\mathsf L - \mathsf U})\).
Gauss Seidel’s iteration: \(\mathsf A = ({\color{green}\mathsf D + \mathsf L}) - ({\color{blue}- \mathsf U})\)
Relaxation method: \(\mathsf A = \left({\color{green}\frac{\mathsf D}{\omega} + \mathsf L }\right) - \left({\color{blue}\frac{1 - \omega}{\omega} \mathsf D - \mathsf U} \right)\). \(~\)(Coincides with Gauss-Seidel when \(\omega = 1\))
Proposition: convergence of the splitting method
The iteration \[ {\color{green}\mathsf M} \mathbf{\boldsymbol{x}}^{(k+1)} = {\color{blue} \mathsf N} \mathbf{\boldsymbol{x}}^{(k)} + \mathbf{\boldsymbol{b}} \] converges for any initial \(\mathbf{\boldsymbol{x}}^{(0)}\) if and only if \(\rho({\color{green}\mathsf M}^{-1} {\color{blue} \mathsf N}) < 1\).
In addition, for any \(\varepsilon > 0\) there exists \(K > 0\) such that \[ \forall k \geqslant K, \qquad {\lVert {\mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_*} \rVert} \leqslant\bigl(\rho({\color{green}\mathsf M}^{-1} {\color{blue} \mathsf N}) + \varepsilon\bigr)^k {\lVert {\mathbf{\boldsymbol{x}}^{(0)} - \mathbf{\boldsymbol{x}}_*} \rVert}. \]
Proof
Subtracting the equations \[ \begin{aligned} {\color{green}\mathsf M} \mathbf{\boldsymbol{x}}^{(k+1)} &= {\color{blue} \mathsf N} \mathbf{\boldsymbol{x}}^{(k)} + \mathbf{\boldsymbol{b}} \\ {\color{green}\mathsf M} \mathbf{\boldsymbol{x}}_* &= {\color{blue} \mathsf N} \mathbf{\boldsymbol{x}}_* + \mathbf{\boldsymbol{b}} \end{aligned} \] and rearranging gives \[ \mathbf{\boldsymbol{x}}^{(k+1)} - \mathbf{\boldsymbol{x}}_* = {\color{green}\mathsf M}^{-1} {\color{blue} \mathsf N} (\mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_*) = \dots = ({\color{green}\mathsf M}^{-1} {\color{blue} \mathsf N})^{k+1} (\mathbf{\boldsymbol{x}}^{(0)} - \mathbf{\boldsymbol{x}}_*). \] The “only if” part of the first item is simple. The other claims follow from Gelfand’s formula: \[ \forall \mathsf B \in \mathbf C^{n \times n}, \qquad \lim_{k \to \infty} {\lVert {\mathsf B^k} \rVert}^{1/k} = \rho(\mathsf B). \]
Settings where \(\rho(\mathsf M^{-1} \mathsf N) < 1\) are identified on a case by case basis:
Richardson’s iteration, sufficient condition: \(\mathsf A\) symmetric positive definite;
Jacobi’s iteration, sufficient condition: \(\mathsf A\) strictly row or column diagonally dominant:
\[ \lvert a_{ii} \rvert > \sum_{j \neq i} \lvert a_{ij} \rvert \quad \forall i \qquad \text{ or } \qquad \lvert a_{jj} \rvert > \sum_{i \neq j} \lvert a_{ij} \rvert \quad \forall j. \]
Gauss Seidel’s iteration, sufficient condition: \(\mathsf A\) strictly row or column diagonally dominant;
Relaxation method, sufficient condition: \(\mathsf A\) symmetric positive definite and \(\omega \in (0, 2)\);
Relaxation method, necessary condition: \(\omega \in (0, 2)\);
…
Suppose again that \(\mathsf A\) is symmetric positive definite and recall Richardson’s iteration:
\[\begin{align*} \mathbf{\boldsymbol{x}}^{(k + 1)} = \mathbf{\boldsymbol{x}}^{(k)} - \omega \nabla f(\mathbf{\boldsymbol{x}}^{(k)}), \qquad f(\mathbf{\boldsymbol{x}}) := \frac{1}{2} \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}^T\mathbf{\boldsymbol{x}}. \end{align*}\]
Proposition
For any \(\mathbf{\boldsymbol{d}} \in \mathbf R^n\), it holds that \[ \mathop{\mathrm{arg\,min}}_{\omega \in \mathbf R} f \left(\mathbf{\boldsymbol{x}} - \omega \mathbf{\boldsymbol{d}} \right) = \frac{\mathbf{\boldsymbol{d}}^T\mathbf{\boldsymbol{r}}}{\mathbf{\boldsymbol{d}}^T\mathsf A \mathbf{\boldsymbol{d}}}, \qquad \mathbf{\boldsymbol{r}} = \mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}. \]
We can improve upon Richardson’s iteration by letting \[ \mathbf{\boldsymbol{x}}^{(k + 1)} = \mathbf{\boldsymbol{x}}^{(k)} - \omega_{\color{red}k} \nabla f(\mathbf{\boldsymbol{x}}^{(k)}), \qquad \omega_{\color{red}k} := \frac{\mathbf{\boldsymbol{d}}_k^T\mathbf{\boldsymbol{d}}_k}{\mathbf{\boldsymbol{d}}_k^T\mathsf A \mathbf{\boldsymbol{d}}_k}, \qquad \mathbf{\boldsymbol{d}}_k := \mathsf A \mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{b}}. \]
This is the steepest descent method with optimal step.
First note that \[ f(\mathbf{\boldsymbol{x}}) = \frac{1}{2} \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}^T\mathbf{\boldsymbol{x}} = \frac{1}{2} {\lVert {\mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{x}}_*} \rVert}_{\mathsf A}^2 + \text{constant}, \qquad {\lVert {\mathbf{\boldsymbol{y}}} \rVert}_{\mathsf A} := \sqrt{{\langle {\mathbf{\boldsymbol{y}}, \mathbf{\boldsymbol{y}}} \rangle}_{\mathsf A}}, \qquad {\langle {\mathbf{\boldsymbol{x}}, \mathbf{\boldsymbol{y}}} \rangle}_{\mathsf A} := \mathbf{\boldsymbol{x}}^T\mathsf A \mathbf{\boldsymbol{y}}. \] Given \(n\) conjugate directions \(\{\mathbf{\boldsymbol{d}}_0, \dotsc, \mathbf{\boldsymbol{d}}_{n-1}\}\) such that \({\langle {\mathbf{\boldsymbol{d}}_i, \mathbf{\boldsymbol{d}}_j} \rangle}_{\mathsf A} = \delta_{ij}\), we have \[ \mathbf{\boldsymbol{x}}_* - \mathbf{\boldsymbol{x}}^{(0)} = \sum_{i=0}^{n-1} \mathbf{\boldsymbol{d}}_i \mathbf{\boldsymbol{d}}_i^T\mathsf A (\mathbf{\boldsymbol{x}}_* - \mathbf{\boldsymbol{x}}^{(0)}) = \sum_{i=0}^{n-1} \mathbf{\boldsymbol{d}}_i \mathbf{\boldsymbol{d}}_i^T({\color{green}\mathbf{\boldsymbol{b}}} - \mathsf A \mathbf{\boldsymbol{x}}^{(0)}) \]
\(\color{green}\rightarrow\) the \(\mathsf A\)-projections of \(\mathbf{\boldsymbol{x}}_* - \mathbf{\boldsymbol{x}}^{(0)}\) can be calculated even though \(\mathbf{\boldsymbol{x}}_*\) is unknown!
Minimization property: by construction,
\(\mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}^{(0)}\) is the \(\mathsf A\)-orthogonal projection of \(\mathbf{\boldsymbol{x}}_* - \mathbf{\boldsymbol{x}}^{(0)}\) onto \(\mathop{\mathrm{Span}}\{\mathbf{\boldsymbol{d}}_0, \dotsc, \mathbf{\boldsymbol{d}}_{k-1}\}\).
Therefore, \(\mathbf{\boldsymbol{x}}^{(k)}\) minimizes \(f(\mathbf{\boldsymbol{x}}^{(k)})\) over the full affine subspace \(\mathbf{\boldsymbol{x}}^{(0)} + \mathop{\mathrm{Span}}\{\mathbf{\boldsymbol{d}}_0, \dotsc, \mathbf{\boldsymbol{d}}_{k-1}\}\).
Consider again the linear system \[ \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 7 \\ 8 \end{pmatrix} \]
(non-normalized) Conjugate directions calculated by Gram-Schmidt: \[ \mathbf{\boldsymbol{d}}_0 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \qquad \mathbf{\boldsymbol{d}}_1 = \begin{pmatrix} -1 \\ 2 \end{pmatrix}. \]
Question: how to generate conjugate directions ?
\({\color{green} \rightarrow}\) the conjugate gradient method enables to generate conjugate directions on the fly.
Theorem
Suppose that \(\mathsf A\) is symmetric positive definite. Then \[ \forall k \geqslant 0, \qquad {\lVert {\mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{x}}_*} \rVert}_{\mathsf A} \leqslant 2 \left( \frac{{\color{red}\sqrt{\kappa_2(\mathsf A)}} - 1}{{\color{red}\sqrt{\kappa_2(\mathsf A)}} + 1} \right)^{k} {\lVert {\mathbf{\boldsymbol{x}}^{(0)} - \mathbf{\boldsymbol{x}}_*} \rVert}_{\mathsf A}, \]
Many iterative methods used at present are based on Krylov subspaces: \[ \mathcal K_k(\mathsf A, \mathbf{\boldsymbol{b}}) = \mathop{\mathrm{Span}}\Bigl\{ \mathbf{\boldsymbol{b}}, \mathsf A \mathbf{\boldsymbol{b}}, \mathsf A^2 \mathbf{\boldsymbol{b}}, \dotsc, \mathsf A^{k-1} \mathbf{\boldsymbol{b}} \Bigr\}. \]
Full orthogonalization method (FOM): find \(\mathbf{\boldsymbol{x}}^{(k)}\) in \(\mathcal K_k\) such that the following Galerkin condition is satisfied:
\[{\langle {\mathsf A \mathbf{\boldsymbol{x}}^{(k)} - \mathbf{\boldsymbol{b}}, \mathcal K_k} \rangle} = 0.\]
Iterates coincide with conjugate gradient method when \(\mathsf A\) is symmetric positive definite.