Solving nonlinear equations numerically

Cours ENPC — Pratique du calcul scientifique

Introduction

Objective

Goal of this chapter

Study numerical methods to find solutions to \[ \mathbf{\boldsymbol{f}}(\mathbf{\boldsymbol{x}}) = \mathbf{\boldsymbol{0}}, \] where \(\mathbf{\boldsymbol{f}}\colon \mathbf R^n \to \mathbf R^n\) (number of unknows = number of equations).

Example applications

  • Calculate \(\sqrt{2}\) by solving \(x^2 = 2\);

  • Solve two point boundary value problems, e.g.

    \[ u''(t) = f\bigl(t, u(t)\bigr), \qquad u(0) = u(1) = 0. \]

  • Solve nonlinear partial differential equations, e.g. \[ \nabla \cdot \bigl(\alpha(u)\nabla u \bigr) = f. \]

Remark

A linear system of the form \(\mathsf A \mathbf{\boldsymbol{x}} = \mathbf{\boldsymbol{b}}\) can be recast in this form with \[ \mathbf{\boldsymbol{f}}(\mathbf{\boldsymbol{x}}) = \mathsf A \mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{b}}. \] With this notation, Richardson’s method for linear systems rewrites \[ \mathbf{\boldsymbol{x}}_{k+1} = \mathbf{\boldsymbol{x}}_k - \omega \mathbf{\boldsymbol{f}}(\mathbf{\boldsymbol{x}}_k). \] Questions:

  • Is such a method convergent for general nonlinear equations?

  • How fast is the convergence?

  • What is the optimal value of \(\omega\).

Simplest approach: the bisection method

Setting and idea

Suppose that \(f\colon [a, b] \to \mathbf R\) is continuous and \(f(a) \leqslant 0 \leqslant f(b).\)

\({\color{green} \rightarrow}\) Then there is a root of \(f\) in \([a, b]\). Let \(x = \frac{a + b}{2}\).

  • If \(f(a)f(x) \leqslant 0\), then there is a root of \(f\) in \([a, x]\);

  • If \(f(a)f(x) > 0\), then there is a root of \(f\) in \([x, b]\).

function bisection(f, a, b, ε = 1e-10)
    @assert f(a) * f(b)  0
    while abs(b - a)  ε
        x = (a + b) / 2
        a, b = f(a) * f(x)  0 ? [a, x] : [x, b]
    end
    return (a + b) / 2
end

println(cbrt(2))
bisection(x -> x^3 - 2, 0, 2)
1.2599210498948732
1.259921049902914

Convergence

It holds that \[ |b_i - a_i| = \frac{|b - a|}{2^i} \] Both sequences \((a_i)\) and \((b_i)\) converge to a root \(x_*\) of \(f\).

\(~\)

Comments

  • Works only in the one-dimensional setting;

  • Robust but slow…

Simplest approach: the bisection method

Quantifying the convergence speed

Definition

We say that a sequence \((\mathbf{\boldsymbol{x}}_k)_{k\in \mathbf N}\) converges to \(\mathbf{\boldsymbol{x}}_*\)

  • linearly if \[ \lim_{k\to\infty} \frac{{\lVert {\mathbf{\boldsymbol{x}}_{k+1} - \mathbf{\boldsymbol{x}}_*} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}_k - \mathbf{\boldsymbol{x}}_*} \rVert}} \in (0, 1). \]

  • superlinearly if \[ \lim_{k\to\infty} \frac{{\lVert {\mathbf{\boldsymbol{x}}_{k+1} - \mathbf{\boldsymbol{x}}_*} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}_k - \mathbf{\boldsymbol{x}}_*} \rVert}} = 0. \]

  • with order \(q\) if \(\mathbf{\boldsymbol{x}}_k \to \mathbf{\boldsymbol{x}}_*\) and \[ \lim_{k\to\infty} \frac{{\lVert {\mathbf{\boldsymbol{x}}_{k+1} - \mathbf{\boldsymbol{x}}_*} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}_k - \mathbf{\boldsymbol{x}}_*} \rVert}^q} \in (0, \infty). \] If \(q = 2\), we say the sequence converges quadratically.

Examples

  • Most iterative methods for linear equations converge linearly.

  • The bisection method converges linearly.

Fixed-point methods

Fixed-point methods

Idea

Rewrite

\[\mathbf{\boldsymbol{f}}(\mathbf{\boldsymbol{x}}) = \mathbf{\boldsymbol{0}} \qquad \Leftrightarrow \qquad \mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}) = \mathbf{\boldsymbol{x}}\]

and use iterative scheme

\[ \mathbf{\boldsymbol{x}}_{k+1} = \mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}_k) \]

Examples (more on these in the next section)

  • Chord method: for for \(\alpha \neq 0\) \[ \mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}) = \mathbf{\boldsymbol{x}} - \frac{\mathbf{\boldsymbol{f}}(\mathbf{\boldsymbol{x}})}{\alpha} \] \({\color{green} \rightarrow}\) In the linear setting, this is Richardson’s iteration with \(\omega = \alpha^{-1}\).

  • Newton–Raphson method: \[ \mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}) = \mathbf{\boldsymbol{x}} - \mathsf J_f(\mathbf{\boldsymbol{x}})^{-1} \mathbf{\boldsymbol{f}}(\mathbf{\boldsymbol{x}}), \] where \(\mathsf J_f(\mathbf{\boldsymbol{x}})\) is the Jacobian matrix of \(\mathbf{\boldsymbol{f}}\) at \(\mathbf{\boldsymbol{x}}\).

Proposition: convergence under Lipschitz assumption

If \(\mathbf{\boldsymbol{F}}\) is Lipschitz continuous with \(L < 1\), \[ \forall (\mathbf{\boldsymbol{x}}, \mathbf{\boldsymbol{y}}) \in \mathbf R^n \times \mathbf R^n, \qquad {\lVert {\mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}) - \mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{y}})} \rVert} \leqslant L {\lVert {\mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{y}}} \rVert} \] then

  • There exists a unique fixed point \(\mathbf{\boldsymbol{x}}_*\) of \(\mathbf{\boldsymbol{F}}\).

  • The sequence \((\mathbf{\boldsymbol{x}}_k)_{k\in \mathbf N}\) converges exponentially to \(\mathbf{\boldsymbol{x}}_*\): \[ \forall k \in \mathbf N, \qquad {\lVert {\mathbf{\boldsymbol{x}}_k - \mathbf{\boldsymbol{x}}_*} \rVert} \leqslant L^k {\lVert {\mathbf{\boldsymbol{x}}_0 - \mathbf{\boldsymbol{x}}_*} \rVert}. \]

Sketch of proof

  • Show that \((\mathbf{\boldsymbol{x}}_k)\) is Cauchy, thus convergent to some \(\mathbf{\boldsymbol{x}}_* \in \mathbf R^n\).

  • Show that \(\mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}_*) = \mathbf{\boldsymbol{x}}_*\).

  • Deduce exponential convergence from \(\mathbf{\boldsymbol{x}}_{k+1} - \mathbf{\boldsymbol{x}}_* = \mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}_k) - \mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}_*)\).

Relaxing the global Lipschitz assumption

Proposition: local convergence under local Lipschitz condition

Assume that \(\mathbf{\boldsymbol{x}}_*\) is a fixed point of \(\mathbf{\boldsymbol{F}}\) and \[ \forall \mathbf{\boldsymbol{x}} \in B_{\delta}(\mathbf{\boldsymbol{x}}_*), \qquad {\lVert {\mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}) - \mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}_*)} \rVert} \leqslant L {\lVert {\mathbf{\boldsymbol{x}} - \mathbf{\boldsymbol{x}}_*} \rVert} \qquad(1)\] for \(L < 1\). If \(\mathbf{\boldsymbol{x}}_0 \in B_{\delta}(\mathbf{\boldsymbol{x}}_*)\), then \(\mathbf{\boldsymbol{x}}_k \in B_{\delta}(\mathbf{\boldsymbol{x}}_*)\) for all \(k \in \mathbf N\) and \[ {\lVert {\mathbf{\boldsymbol{x}}_k - \mathbf{\boldsymbol{x}}_*} \rVert} \leqslant L^k {\lVert {\mathbf{\boldsymbol{x}}_0 - \mathbf{\boldsymbol{x}}_*} \rVert}. \]

\(~\)

Proposition: local convergence criteria based on the Jacobian

Assume that \(\mathbf{\boldsymbol{F}}\) is differentiable at a fixed point \(\mathbf{\boldsymbol{x}}_*\).

  • There is \(\delta > 0\) such that Equation 1 is satisfied iff \({\lVert {\mathsf J_F(\mathbf{\boldsymbol{x}}_*)} \rVert} < 1\).

  • There is \(\delta > 0\) such that Equation 1 is satisfied in some induced norm iff \(\rho\bigl(\mathsf J_F(\mathbf{\boldsymbol{x}}_*)\bigr) < 1\).

Fixed-point methods: superlinear convergence

Proposition: superlinear convergence

Assume that \(\mathbf{\boldsymbol{x}}_*\) is a fixed point of \(\mathbf{\boldsymbol{F}}\) and that \(\mathsf J_F(\mathbf{\boldsymbol{x}}_*) = 0\). If \(\mathbf{\boldsymbol{x}}_k \to \mathbf{\boldsymbol{x}}_*\) as \(k \to \infty\), then \[ \lim_{k \to \infty} \frac{{\lVert {\mathbf{\boldsymbol{x}}_{k+1} - \mathbf{\boldsymbol{x}}_*} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}_k - \mathbf{\boldsymbol{x}}_*} \rVert}} = 0. \]

Remark

Under the assumption, there is \(\delta > 0\) such that \((\mathbf{\boldsymbol{x}}_k)_{k\geqslant 0}\) is a sequence converging to \(\mathbf{\boldsymbol{x}}_*\) for all \(\mathbf{\boldsymbol{x}}_0 \in B_{\delta}(\mathbf{\boldsymbol{x}}_*)\).

Proof

It holds that \[ \frac{{\lVert {\mathbf{\boldsymbol{x}}_{k+1} - \mathbf{\boldsymbol{x}}_*} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}_k - \mathbf{\boldsymbol{x}}_*} \rVert}} = \frac{{\lVert {\mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}_{k}) - \mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}_*)} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}_k - \mathbf{\boldsymbol{x}}_*} \rVert}} = \frac{{\lVert {\mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}_{k}) - \mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}_*) - \mathsf J_F(\mathbf{\boldsymbol{x}}_*) (\mathbf{\boldsymbol{x}}_k - \mathbf{\boldsymbol{x}}_*)} \rVert}}{{\lVert {\mathbf{\boldsymbol{x}}_k - \mathbf{\boldsymbol{x}}_*} \rVert}}. \] Since \(\mathbf{\boldsymbol{x}}_k - \mathbf{\boldsymbol{x}}_* \to \mathbf{\boldsymbol{0}}\) as \(k \to \infty\), the right-hand side converges to 0 by definition of the Jacobian matrix.

Examples of fixed point method

The chord method

Description of the method

The chord method is the iteration \[ \mathbf{\boldsymbol{x}}_{k+1} = \mathbf{\boldsymbol{x}}_k - \mathsf A^{-1} \mathbf{\boldsymbol{f}}(x_k), \] with \(\mathsf A \in \mathbf R^{n \times n}\) an invertible matrix parameter.

\(~\)

Remarks

  • This is a fixed-point method with \(\mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}) = \mathbf{\boldsymbol{x}} - \mathsf A^{-1} \mathbf{\boldsymbol{f}}(\mathbf{\boldsymbol{x}})\).

  • Clearly \(\mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}) = \mathbf{\boldsymbol{x}}\) if and only if \(\mathbf{\boldsymbol{f}}(\mathbf{\boldsymbol{x}}) = \mathbf{\boldsymbol{0}}\).

  • When \(n = 1\), the iteration reads \[ x_{k+1} = x_k - \frac{f(x_k)}{\alpha}, \] \({\color{green} \rightarrow}\) In the linear setting, this is Richardson’s iteration.

Convergence study when \(n=1\)

In this case \[ F(x) = x - \frac{f(x)}{\alpha}, \qquad F'(x) = 1 - \frac{f'(x)}{\alpha} \] We deduce:

  • The iteration converges locally close to the fixed point iff \[ \left| 1 - \frac{f'(x_*)}{\alpha} \right| < 1. \] \({\color{green} \rightarrow}\alpha\) must be of the same sign as \(f'(x_*)\) and sufficiently large: \[ \left|\alpha\right|>\frac{\left|f'(x_*)\right|}{2} \]

  • The convergence is superlinear if \[ \alpha = f'(x_*) \]

Fixed-point methods: geometric interpretation with \(F\)

\[ F(x) = x - \frac{1}{\alpha}\tanh(x-1), \qquad x_* = 1, \]

\[ F'(x_*) = 1 - \frac{1}{\alpha\,\cosh^2(x-1)}=1-\frac{1}{\alpha} \]

Fixed-point methods: geometric interpretation with \(F\)

\[ F(x) = x - \frac{1}{\alpha}\tanh(x-1), \qquad x_* = 1, \]

\[ F'(x_*) = 1 - \frac{1}{\alpha\,\cosh^2(x-1)}=1-\frac{1}{\alpha} \]

Fixed-point methods: geometric interpretation with \(F\)

\[ F(x) = x - \frac{1}{\alpha}\tanh(x-1), \qquad x_* = 1, \]

\[ F'(x_*) = 1 - \frac{1}{\alpha\,\cosh^2(x-1)}=1-\frac{1}{\alpha} \]

Fixed-point methods: geometric interpretation with \(F\)

\[ F(x) = x - \frac{1}{\alpha}\tanh(x-1), \qquad x_* = 1, \]

\[ F'(x_*) = 1 - \frac{1}{\alpha\,\cosh^2(x-1)}=1-\frac{1}{\alpha} \]

The chord method: geometric interpretation with \(f\)

Geometric interpretation

At each iteration:

  • Approximate \(f(x)\) by the straight line \[ \widehat f_k(x) = f(x_k) + \alpha (x - x_k) \]

  • Let \(x_{k+1}\) be the root of \(\widehat f_k\).

The chord method: geometric interpretation

The chord method: numerical experiment

function chord(f, α, x; ε = 1e-12, maxiter = 10^7)
    for i in 1:maxiter
        x = x - f(x) / α
        if abs(f(x)) < ε
            return x, i
        end
    end
    error("Failed to converge!")
end

Toy problem

Let us apply the chord method to \[ f(x) = x^2 - 2. \] In this case \[ F(x) = x - \frac{x^2 - 2}{\alpha}, \qquad F'(x) = 1 - \frac{2x}{\alpha}. \] Here \(x_* = \sqrt{2}\) and \(f'(x_*) = 2\sqrt{2}\).

chord(x -> x^2 - 2, 22, 1)
(1.4142135623729684, 4)

\(~\)

chord(x -> x^2 - 2, √2 - .01, 1)
"Failed to converge!"

\(~\)

chord(x -> x^2 - 2, √2 + .01, 1)
(1.414213562373444, 1893)

\(~\)

chord(x -> x^2 - 2, 10, 1)
(1.4142135623728198, 85)

\(~\)

chord(x -> x^2 - 2, 100, 1)
(1.4142135623727494, 975)

The Newton-Raphson method

Motivation

  • The chord method is optimal with \(\mathsf A = \mathsf J_f(\mathbf{\boldsymbol{x}}_*)\);

  • However, \(\mathbf{\boldsymbol{x}}_*\) is unknown a priori…

  • Newton-Raphson method: use \[ \mathsf A = \mathsf A(k) = \mathsf J_f(\mathbf{\boldsymbol{x}}_k). \] which leads to the iteration \[ \mathbf{\boldsymbol{x}}_{k+1} = \mathbf{\boldsymbol{x}}_k - \mathsf J_f(\mathbf{\boldsymbol{x}}_k)^{-1} \mathbf{\boldsymbol{f}}(\mathbf{\boldsymbol{x}}_k) \]

One-dimensional version

In dimension one, the Newton-Raphson iteration reads \[ x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)} \]

Remarks

  • The method requires to solve a linear system at each step.

  • The iteration is well defined only if \(\mathbf{\boldsymbol{J}}_f(\mathbf{\boldsymbol{x}}_k)\) is nonsingular.

  • The method is a fixed point iteration with \[ \mathbf{\boldsymbol{F}}(\mathbf{\boldsymbol{x}}) = \mathbf{\boldsymbol{x}} - \mathsf J_f(\mathbf{\boldsymbol{x}})^{-1} \mathbf{\boldsymbol{f}}(\mathbf{\boldsymbol{x}}). \]

  • Dimension 1: If \(f'(x_*) \neq 0\), then \[ F'(x_*) = 0. \] \({\color{green} \rightarrow}\) local superlinear convergence.

  • Dimension \(n\): If \(\mathsf J_f(\mathbf{\boldsymbol{x}}_*)\) is nonsingular, then \[ \mathsf J_F(\mathbf{\boldsymbol{x}}_*) = \mathsf 0. \] \({\color{green} \rightarrow}\) local superlinear convergence.

The Newton-Raphson method: convergence analysis

Theorem: Quadratic convergence

Assume that \(f \in C^2(\mathbf R)\), that \(f'(x_*) \neq 0\), and that the Newton-Raphson iteration converges to \(x_*\). Then \[ \lim_{k \to \infty} \frac{{\lvert {x_{k+1} - x_*} \rvert}}{{\lvert {x_{k} - x_*} \rvert}^2} = \left| \frac{f''(x_*)}{2 f'(x_*)} \right|. \]

Proof

We note that \[ x_{k+1} - x_* = x_{k} - \frac{f(x_k)}{f'(x_k)} - x_* = \frac{1}{f'(x_k)} \underbrace{\Bigl(f'(x_k)(x_{k} - x_*) - f(x_k)\Bigr)}_{= \frac{1}{2} f''(\xi_k) (x_{k} - x_*)^2}. \] for some \(\xi_k\) between \(x_k\) and \(x_*\). We conclude that \[ x_{k+1} - x_* = \frac{f''(\xi_k) (x_k - x_*)^2}{2f'(x_k)}. \]

The Newton-Raphson method: numerical experiment

function newton_raphson(f, df, x, maxiter; ε = 1e-12)
    for i in 1:maxiter
        x -= f(x) / df(x)
        if abs(f(x)) < ε
            return x, i
        end
    end
    return "Failed to converge!"
end;

Toy problem

Let us apply the chord method to \[ f(x) = (x^2 - 2)(x - 3)^4. \]

  • Single roots \(\sqrt{2}\) and \(-\sqrt{2}\).

  • Quadruple root \(3\).

    \({\color{green} \rightarrow}\) superlinear convergence does not apply.

println(√2)
1.4142135623730951

\(~\)

# Import package for automatic differentiation
using Zygote

g(x) = (x^2 - 2) * (x - 3)^4
dg = g'
maxiter = 100
newton_raphson(g, dg, 1, maxiter)
(1.4142135623730951, 6)

\(~\)

newton_raphson(g, dg, 100, maxiter)
(3.000478179164197, 50)

\(~\)

newton_raphson(g, dg, -100, maxiter)
(-1.414213562373095, 25)

The Newton-Raphson method: geometric illustration

Geometric interpretation

At each iteration:

  • Approximate \(f(x)\) by its tangent at \(x_k\) \[ \widehat f_k(x) = f(x_k) + f'(x_k) (x - x_k) \]

  • Let \(x_{k+1}\) be the root of \(\widehat f_k\).

The Newton-Raphson method: geometric illustration