Implementation aspects for CBO/S

Lorentz Center Workshop on Purpose-driven Particle Systems

Urbain Vaes

MATHERIALS team, Inria Paris & Cermics, École des Ponts

Discretization of CBO and CBS

In practice, one has to use a version of the methods that

  • is discrete in time,

  • uses a finite number of particles \(J\).

CBO CBS
Continuous-time and \(J = \infty\) ✔️ ✔️
Continuous-time and finite \(J\) ✔️
Discrete-time and \(J = \infty\) ✔️
Discrete-time and finite \(J\) ✔️

Discretization of CBO

Consider the following version with element-wise noise: \[ \d \theta^{(j)}_t = - \Bigl(\theta^{(j)}_t- \mathcal M_{\beta}(\mu^J_t)\Bigr) \, \d t + \sqrt{2}\sigma \Bigl(\theta^{(j)}_t - \mathcal M_{\beta}(\mu^J_t)\Bigr) \odot \d W^{(j)}_t, \qquad j=1, \dots,J \] where \(\mathcal M_{\beta}(\mu^J_t)\) is given by \[ \mathcal M_{\beta}(\mu^J_t) = \frac {\int \theta \e^{-\beta f(\theta)} \mu^J_t(\d \theta)} {\int \e^{-\beta f(\theta)} \mu^J_t(\d \theta)}, \qquad \mu^J_t = \frac{1}{J} \sum_{j=1}^{J} \delta_{\theta^{(j)}_t}. \]

Two parameters:

  • \(\beta\) is the “inverse temperature” in the reweighting.

  • \(\sigma\) is the diffusion coefficient, related to exploration;

  • The drift parameter is not required (time rescaling).

Reminder: advantage of element-wise noise

For the mean field equation \[ \d \theta_t = - \Bigl(\theta_t- \mathcal M_{\beta}(\mu_t)\Bigr) + \sqrt{2}\sigma \Bigl(\theta_t - \mathcal M_{\beta}(\mu_t)\Bigr) \odot \d W_t, \qquad \mu_t = {\rm Law}(\theta_t) \]

Under parameter constraints independent of \(d\),

  • Consensus : exponentially fast with rate independent of \(d\),

\[\mathcal C(\rho_t) \to 0 \qquad \text{and} \qquad \mathcal M(\rho_t) \to \theta_{\infty}\]

  • Accuracy : For \(C_L\) independent of \(d\),

\[ \begin{aligned} f(\theta_{\infty}) - f(\theta_*) &\leq \expect \e^{-\beta \bigl( f(\theta_0) - f(\theta_*) \bigr)} + \frac{\log 2}{\beta} \\ &\leq \frac{\color{red}{d}}{2} \frac{\log(\beta/(2\pi))}{\beta} + \frac{C_L}{\beta}. \end{aligned} \]

Discretization methods

  1. Euler-Maruyama:

\[ \theta^{(j)}_{n+1} = \theta^{(j)}_{n} - \Bigl(\theta^{(j)}_{n} - \mathcal M_{\beta}(\mu^J_{n})\Bigr) \, \Delta + \sqrt{2 \Delta} \sigma \Bigl(\theta^{(j)}_n - \mathcal M_{\beta}(\mu^J_n)\Bigr) \odot \xi^{(j)}_n, \]

  1. Splitting to avoid overshooting:

\[ \begin{aligned} \widehat \theta^{(j)}_{n} &= \mathcal M_{\beta}(\mu^J_{n}) + \e^{-\Delta} \left(\theta^{(j)}_n - \mathcal M_{\beta}(\mu^J_{n}) \right) \\ \theta^{(j)}_{n+1} &= \widehat \theta^{(j)}_{n} + \sqrt{2 \Delta} \sigma \Bigl(\widehat \theta^{(j)}_n - \mathcal M_{\beta}(\mu^J_n)\Bigr) \odot \xi^{(j)}_n \end{aligned} \]

  1. Piecewise exact integration with frozen nonlocal term:

\[ \theta^{(j)}_{n+1} = \mathcal M_{\beta}(\mu^J_{n}) + \sum_{\ell=1}^d \mathbf e_\ell \left( \theta^{(j)}_{n} - \mathcal M_{\beta}(\mu^J_{n}) \right)_\ell \exp \left( \left( - 1 - \sigma^2 \right) \Delta + \sqrt{2 \Delta}\sigma \xi^{(j)}_{n,i} \right) \]

Discretizations 2) and 3) were proposed in Carrillo, Jin, Li, Zhu 2021.

Analysis of discrete-time methods

  • (Q1) Consensus: Is there a common consensus state \(\theta_{\infty} = \theta_{\infty}(\omega,\beta,\Delta)\) such that almost surely \[ \forall j \in \{1, \dotsc, J\}, \qquad \lim_{n \to \infty} \theta^{(j)}_n = \theta_{\infty} \]

  • (Q2) Accuracy: How close is \(\theta_{\infty}\) to \(\theta_*\), the global minimizer of \(f\).

    • Dependence on \(J\)?

    • Dependence on \(\beta\)?

    • Dependence on \(\Delta\)?

  • (Q3?) Mean field: studying \(J \to \infty\) at the discrete-time level may be useful.

Analysis for Euler-Maruyama

In Ha, Jin, Kim, 2019, the case where \(W^{(1)} = \dots = W^{(J)}\) is investigated.

Theorem: consensus formation for the Euler-Maruyama scheme

If \(0 < \sigma^2 < 1\) and \(0 < \Delta < 2 - 2\sigma^2\), then

  • it holds for all \(1 \leq i,j \leq J\) that \[ \expect \left\lvert \theta_n^{(i)} - \theta_n^{(j)} \right\rvert^2 \leq \e^{- n \Delta (2 - \Delta - 2 \sigma^2)} \expect \left\lvert \theta_0^{(i)} - \theta_0^{(j)} \right\rvert^2 \]

  • it holds for all \(1 \leq i,j \leq J\) that, almost surely, \[ \left\lvert \theta_n^{(i)} - \theta_n^{(j)} \right\rvert \leq \e^{-n \Delta Y_n} \left\lvert \theta_n^{(i)} - \theta_n^{(j)} \right\rvert \] where \(Y_n\) is a random variable satisfying \[ \lim_{n \to \infty} Y_n(\omega) = \frac{1}{2} (2 - \Delta - 2 \sigma^2). \]

Analysis of a general discrete-time method

Later in Ha, Jin, Kim, 2021, this is generalized to

\[ \theta^{(j)}_{n+1} = \theta^{(j)}_{n} - {\color{red} \gamma} \Bigl(\theta^{(j)}_{n} - \mathcal M_{\beta}(\mu^J_{n})\Bigr) + \sum_{\ell=1}^d \mathbf e_\ell \Bigl(\theta^{(j)}_n - \mathcal M_{\beta}(\mu^J_n)\Bigr)_\ell \, {\color{red}{\eta_{n,\ell}}}, \] with \(\expect \left[{\color{red}{\eta_{n,i}}}\right] = 0\) and \(\expect \left[{\color{red}{\eta_{n,i}}}^2\right] = {\color{red}{\zeta^2}}\).

Theorem: consensus formation for generalized scheme

If \(r := (1 - \gamma)^2 + \zeta^2 < 1\), then

  • it holds for all \(1 \leq i,j \leq J\) that \[ \expect \left\lvert \theta_n^{(i)} - \theta_n^{(j)} \right\rvert^2 = \mathcal O (r^n) \]

  • there exists \(\theta_{\infty}\) such that almost surely \[ \forall j \in \{1, \dotsc, J\}, \qquad \lim_{n \to \infty} \theta^{(j)}_n = \theta_{\infty}. \]

The result is further generalized to heterogeneous noises in Ko, Ha, Jin, Kim, 2022

Analysis of a general discrete-time method

In Ha, Jin, Kim, 2021, the authors also show that, under appropriate conditions, \[ {\rm ess\,inf}_{\omega \in \Omega} f\bigl(\theta_{\infty}(\omega)\bigr) - f(\theta_*) \leq - \frac{1}{\beta} \log \expect \e^{-\beta \bigl( f(\theta_0^{(1)}) - f(\theta_*) \bigr)} + \mathcal O(\beta^{-1}) \] This implies, for well-prepared initial data, \[ {\rm ess\,inf}_{\omega \in \Omega} f\bigl(\theta_{\infty}(\omega)\bigr) - f(\theta_*) \leq \frac{d}{2} \frac{\log \beta}{\beta} + \mathcal O(\beta^{-1}) \] where \(\theta_* = {\rm arg\,min} f\).

  • This is an estimate for the best-case scenario.

  • No guarantee that \(\proba \bigl[ |f(\theta_{\infty}) - f(\theta_*)| \geq \varepsilon \bigr] \to 0\) as \((\beta,J) \to (\infty,\infty)\).

  • This was proved only for homogeneous noise.

Analysis of discrete-time methods for CBO

  • Summary of known results for the discrete scheme with \(J < \infty\):

    Consensus Accuracy
    Same noise ✔️ ✔️
    Independent noises ✔️
  • Other open questions?

    • Influence of discretization method on performance;

    • Choice of time step:

    \(\Delta\) 0.01 0.1 0.2 0.5 1
    Success rate 100% 98% 92% 52% 33%
    Number of iterations 1133.16 111.23 54.46 20.44 12.67

Discretization of CBS

Consider the following mean field sampling method to sample from \(\e^{-f}\): \[ \d \theta_t = - \Bigl(\theta_t - \mathcal M_{\beta}(\mu_t)\Bigr) \, \d t + \sqrt{2 (1+\beta) \mathcal C_{\beta}(\mu_t)} \, \d W_t, \qquad \mu_t = {\rm Law}(\theta_t) \] Discretization by freezing the nonlocal terms over each time step: \[ \theta_{n+1} = \mathcal M_{\beta}(\mu_n) + \e^{-\Delta} \Bigl(\theta_n - \mathcal M_{\beta}(\mu_n)\Bigr) + \sqrt{(1 - \e^{-2\Delta})(1+\beta) \mathcal C_{\beta}(\mu_n)} \xi_n. \]

  • Same steady states as the continuous equation, unconditionally stable

  • We can let \(\alpha = \e^{-\Delta}\) to simplify the notation.

Analysis of the discrete-time method

  • Any steady state must satisfy

\[ \color{blue}{\mu_{\infty}} = \mathcal N \bigl( \mathcal M_{\beta}(\color{blue}{\mu_{\infty}}), (1 + \beta) \mathcal C_{\beta}(\color{blue}{\mu_{\infty}})\bigr). \]

  • For sufficiently large \(\beta\), there exists a steady state \(\mathcal N(m_{\infty}, C_{\infty})\) such that

    \[ |m_{\infty}(\beta) - m_*| = \mathcal O(\beta^{-1}), \qquad |C_{\infty}(\beta) - C_*| = \mathcal O(\beta^{-1}), \] where \(\mathcal N(m_*, C_*)\) the Laplace approximation of \(\e^{-f}\).

  • For gaussian initial condition and target \(\mathcal N(a, A)\), exponential convergence:

\[ |m_n - a| + |C_n - A| = \mathcal O(\lambda^n), \qquad \lambda = \left( \frac{1 - \alpha}{1 + \beta} + \alpha \right) \]

  • For gaussian initial condition and general target \(\e^{-f}\), exponential convergence to Gaussian approximation: \[ |m_n - m_{\infty}(\beta)| + |C_n - C_{\infty}(\beta)| = \mathcal O(\lambda^n), \qquad \lambda = \alpha + (1-\alpha^2)\frac{k}{\beta}. \]

Julia, a good language for scientific computing?

Python Matlab Julia C
Libre? ✔️ ✔️ ✔️
Fast? ✔️ ✔️
Easy? ✔️ ✔️ ✔️
Math-friendly? ✔️ ✔️
Ecosystem? ✔️ ✔️
  • Julia is easy to learn, read and maintain

  • Julia is fast: “As simple as Python with the speed of C++”

  • Julia is well-suited for numerical mathematics

  • Julia is libre software

A bit of history

  • Created in 2009 by J. Bezanson, S. Karpinski, V. B. Shah, and A. Edelman

  • Excerpt from the Julia Manifesto:

We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

Adoption of Julia is rapidly increasing

Performance benchmark

Julia is a great fit for numerical mathematics

  • Elegant syntax to define functions
f(x) = cos(x) + sin(x)
f(1)
1.3817732906760363
  • The syntax for anonymous functions is equally elegant:
import Plots
Plots.plot(x -> cos(x) + sin(x))
  • Use of unicode characters:
ω = 1  # ⚠️ Even emojis work!
f(t) = sin*t)
∇f(t) = ω * cos*t)
(t) = f(t) * f(t)
  • Compiled just in time (for loops allowed!):
@elapsed sum(1/n^2 for n  1:10^8)
0.123740379

In comparison, Python takes about 10 seconds…

  • Contains most math functions out of the box:
π, Float64(), randn(), √2
(π, 2.718281828459045, -1.3736221585723176, 1.4142135623730951)
  • Concise syntax, MATLAB-like syntax for matrices:
M = [1 2 3; 4 5 6]
2×3 Matrix{Int64}:
 1  2  3
 4  5  6
M[:, [1, 3]]
2×2 Matrix{Int64}:
 1  3
 4  6
  • Great syntax for array broadcasting:
broadcast(gcd, 6, [9, 8, 7])
gcd.(6, [9 8 7])
1×3 Matrix{Int64}:
 3  2  1
1 .+ [1; 2; 3] .+ [1 1 1; 1 1 1; 1 1 1]
3×3 Matrix{Int64}:
 3  3  3
 4  4  4
 5  5  5

Julia’s programming paradigm: multiple dispatch

Wikipedia definition: paradigm in which a function or method can be dynamically dispatched based on the run-time type.

  • Example:
f(x::Int) = "Int $x"
f(x::Float64) = "Float: $x"
f(x::Any) = "Generic fallback"
f(1.2)
"Float: 1.2"
  • Makes it simple to extend existing packages:
import Base.-
function -(s1::String, s2::String)
    return s1[1:length(s1)-length(s2)]
end
"julia" - "ia"
"jul"
  • Python, in comparison, uses single dispach.

More realistic example of multiple dispatch

abstract type Shape end
struct Rock     <: Shape end
struct Paper    <: Shape end
struct Scissors <: Shape end
play(::Type{Paper}, ::Type{Rock})     = "Paper wins"
play(::Type{Paper}, ::Type{Scissors}) = "Scissors wins"
play(::Type{Rock},  ::Type{Scissors}) = "Rock wins"
play(::Type{T},     ::Type{T}) where {T<: Shape} = "Tie, try again"
play(a::Type{<:Shape}, b::Type{<:Shape}) = play(b, a) # Commutativity
play(Paper, Rock)
"Paper wins"

Adding a shape is simple:

struct Well <: Shape end
play(::Type{Well}, ::Type{Rock})     = "Well wins"
play(::Type{Well}, ::Type{Scissors}) = "Well wins"
play(::Type{Well}, ::Type{Paper})    = "Paper wins"

Rich ecosystem: Plotting capabilities

Code
using PlotlyJS
ackley(𝐱) = 20 + exp(1) - 20 * exp(-.2/2 * (𝐱[1]^2 + 𝐱[2]^2)) - exp(.5 * sum(cos.(2π*𝐱)))
L, N = 5, 100
𝐱, 𝐲 = L/N * (-N:N), L/N * (-N:N)
meshgrid = [[x, y] for x  𝐱, y  𝐲]
𝐳 = ackley.(meshgrid)
plot(surface(x=𝐱, y=𝐲, z=𝐳, showscale=false))