Discretization of CBO and CBS
\[
\def\e{{\mathrm e}}
\def\d{{\mathrm d}}
\def\real{{\mathbf R}}
\def\expect{{\mathbf E}}
\def\proba{{\mathbf P}}
\]
In practice, one has to use a version of the methods that
Continuous-time and \(J = \infty\)
✔️
✔️
Continuous-time and finite \(J\)
✔️
❌
Discrete-time and \(J = \infty\)
❌
✔️
Discrete-time and finite \(J\)
✔️
❌
Discretization of CBO
Consider the following version with element-wise noise : \[
\d \theta^{(j)}_t = - \Bigl(\theta^{(j)}_t- \mathcal M_{\beta}(\mu^J_t)\Bigr) \, \d t
+ \sqrt{2}\sigma \Bigl(\theta^{(j)}_t - \mathcal M_{\beta}(\mu^J_t)\Bigr) \odot \d W^{(j)}_t,
\qquad j=1, \dots,J
\] where \(\mathcal M_{\beta}(\mu^J_t)\) is given by \[
\mathcal M_{\beta}(\mu^J_t) =
\frac
{\int \theta \e^{-\beta f(\theta)} \mu^J_t(\d \theta)}
{\int \e^{-\beta f(\theta)} \mu^J_t(\d \theta)},
\qquad \mu^J_t = \frac{1}{J} \sum_{j=1}^{J} \delta_{\theta^{(j)}_t}.
\]
Two parameters:
\(\beta\) is the “inverse temperature” in the reweighting.
\(\sigma\) is the diffusion coefficient, related to exploration;
The drift parameter is not required (time rescaling).
Reminder: advantage of element-wise noise
For the mean field equation \[
\d \theta_t = - \Bigl(\theta_t- \mathcal M_{\beta}(\mu_t)\Bigr)
+ \sqrt{2}\sigma \Bigl(\theta_t - \mathcal M_{\beta}(\mu_t)\Bigr) \odot \d W_t,
\qquad \mu_t = {\rm Law}(\theta_t)
\]
Under parameter constraints independent of \(d\) ,
Consensus : exponentially fast with rate independent of \(d\) ,
\[\mathcal C(\rho_t) \to 0 \qquad \text{and} \qquad \mathcal M(\rho_t) \to \theta_{\infty}\]
Accuracy : For \(C_L\) independent of \(d\) ,
\[
\begin{aligned}
f(\theta_{\infty}) - f(\theta_*)
&\leq \expect \e^{-\beta \bigl( f(\theta_0) - f(\theta_*) \bigr)} + \frac{\log 2}{\beta} \\
&\leq \frac{\color{red}{d}}{2} \frac{\log(\beta/(2\pi))}{\beta} + \frac{C_L}{\beta}.
\end{aligned}
\]
Discretization methods
Euler-Maruyama:
\[
\theta^{(j)}_{n+1} = \theta^{(j)}_{n} - \Bigl(\theta^{(j)}_{n} - \mathcal M_{\beta}(\mu^J_{n})\Bigr) \, \Delta
+ \sqrt{2 \Delta} \sigma \Bigl(\theta^{(j)}_n - \mathcal M_{\beta}(\mu^J_n)\Bigr) \odot \xi^{(j)}_n,
\]
Splitting to avoid overshooting:
\[
\begin{aligned}
\widehat \theta^{(j)}_{n} &= \mathcal M_{\beta}(\mu^J_{n}) + \e^{-\Delta} \left(\theta^{(j)}_n - \mathcal M_{\beta}(\mu^J_{n}) \right) \\
\theta^{(j)}_{n+1} &= \widehat \theta^{(j)}_{n} + \sqrt{2 \Delta} \sigma \Bigl(\widehat \theta^{(j)}_n - \mathcal M_{\beta}(\mu^J_n)\Bigr) \odot \xi^{(j)}_n
\end{aligned}
\]
Piecewise exact integration with frozen nonlocal term:
\[
\theta^{(j)}_{n+1} = \mathcal M_{\beta}(\mu^J_{n}) + \sum_{\ell=1}^d \mathbf e_\ell \left( \theta^{(j)}_{n} - \mathcal M_{\beta}(\mu^J_{n}) \right)_\ell \exp \left( \left( - 1 - \sigma^2 \right) \Delta + \sqrt{2 \Delta}\sigma \xi^{(j)}_{n,i} \right)
\]
Discretizations 2) and 3) were proposed in Carrillo, Jin, Li, Zhu 2021 .
Analysis of discrete-time methods
(Q1) Consensus : Is there a common consensus state \(\theta_{\infty} = \theta_{\infty}(\omega,\beta,\Delta)\) such that almost surely \[
\forall j \in \{1, \dotsc, J\}, \qquad
\lim_{n \to \infty} \theta^{(j)}_n = \theta_{\infty}
\]
(Q2) Accuracy : How close is \(\theta_{\infty}\) to \(\theta_*\) , the global minimizer of \(f\) .
(Q3?) Mean field : studying \(J \to \infty\) at the discrete-time level may be useful.
Analysis for Euler-Maruyama
In Ha, Jin, Kim, 2019 , the case where \(W^{(1)} = \dots = W^{(J)}\) is investigated.
Theorem: consensus formation for the Euler-Maruyama scheme
If \(0 < \sigma^2 < 1\) and \(0 < \Delta < 2 - 2\sigma^2\) , then
it holds for all \(1 \leq i,j \leq J\) that \[
\expect \left\lvert \theta_n^{(i)} - \theta_n^{(j)} \right\rvert^2
\leq \e^{- n \Delta (2 - \Delta - 2 \sigma^2)} \expect \left\lvert \theta_0^{(i)} - \theta_0^{(j)} \right\rvert^2
\]
it holds for all \(1 \leq i,j \leq J\) that, almost surely, \[
\left\lvert \theta_n^{(i)} - \theta_n^{(j)} \right\rvert \leq \e^{-n \Delta Y_n} \left\lvert \theta_n^{(i)} - \theta_n^{(j)} \right\rvert
\] where \(Y_n\) is a random variable satisfying \[
\lim_{n \to \infty} Y_n(\omega) = \frac{1}{2} (2 - \Delta - 2 \sigma^2).
\]
Analysis of a general discrete-time method
Later in Ha, Jin, Kim, 2021 , this is generalized to
\[
\theta^{(j)}_{n+1} = \theta^{(j)}_{n} - {\color{red} \gamma} \Bigl(\theta^{(j)}_{n} - \mathcal M_{\beta}(\mu^J_{n})\Bigr)
+ \sum_{\ell=1}^d \mathbf e_\ell \Bigl(\theta^{(j)}_n - \mathcal M_{\beta}(\mu^J_n)\Bigr)_\ell \, {\color{red}{\eta_{n,\ell}}},
\] with \(\expect \left[{\color{red}{\eta_{n,i}}}\right] = 0\) and \(\expect \left[{\color{red}{\eta_{n,i}}}^2\right] = {\color{red}{\zeta^2}}\) .
Theorem: consensus formation for generalized scheme
If \(r := (1 - \gamma)^2 + \zeta^2 < 1\) , then
it holds for all \(1 \leq i,j \leq J\) that \[
\expect \left\lvert \theta_n^{(i)} - \theta_n^{(j)} \right\rvert^2
= \mathcal O (r^n)
\]
there exists \(\theta_{\infty}\) such that almost surely \[
\forall j \in \{1, \dotsc, J\}, \qquad
\lim_{n \to \infty} \theta^{(j)}_n = \theta_{\infty}.
\]
The result is further generalized to heterogeneous noises in Ko, Ha, Jin, Kim, 2022
Analysis of a general discrete-time method
In Ha, Jin, Kim, 2021 , the authors also show that, under appropriate conditions, \[
{\rm ess\,inf}_{\omega \in \Omega} f\bigl(\theta_{\infty}(\omega)\bigr) - f(\theta_*)
\leq - \frac{1}{\beta} \log \expect \e^{-\beta \bigl( f(\theta_0^{(1)}) - f(\theta_*) \bigr)} + \mathcal O(\beta^{-1})
\] This implies, for well-prepared initial data, \[
{\rm ess\,inf}_{\omega \in \Omega} f\bigl(\theta_{\infty}(\omega)\bigr) - f(\theta_*) \leq \frac{d}{2} \frac{\log \beta}{\beta} + \mathcal O(\beta^{-1})
\] where \(\theta_* = {\rm arg\,min} f\) .
This is an estimate for the best-case scenario .
No guarantee that \(\proba \bigl[ |f(\theta_{\infty}) - f(\theta_*)| \geq \varepsilon \bigr] \to 0\) as \((\beta,J) \to (\infty,\infty)\) .
This was proved only for homogeneous noise.
Analysis of discrete-time methods for CBO
Discretization of CBS
Consider the following mean field sampling method to sample from \(\e^{-f}\) : \[
\d \theta_t = - \Bigl(\theta_t - \mathcal M_{\beta}(\mu_t)\Bigr) \, \d t
+ \sqrt{2 (1+\beta) \mathcal C_{\beta}(\mu_t)} \, \d W_t,
\qquad \mu_t = {\rm Law}(\theta_t)
\] Discretization by freezing the nonlocal terms over each time step: \[
\theta_{n+1} = \mathcal M_{\beta}(\mu_n) + \e^{-\Delta} \Bigl(\theta_n - \mathcal M_{\beta}(\mu_n)\Bigr)
+ \sqrt{(1 - \e^{-2\Delta})(1+\beta) \mathcal C_{\beta}(\mu_n)} \xi_n.
\]
Same steady states as the continuous equation, unconditionally stable
We can let \(\alpha = \e^{-\Delta}\) to simplify the notation.
Analysis of the discrete-time method
Any steady state must satisfy
\[
\color{blue}{\mu_{\infty}} = \mathcal N \bigl( \mathcal M_{\beta}(\color{blue}{\mu_{\infty}}), (1 + \beta) \mathcal C_{\beta}(\color{blue}{\mu_{\infty}})\bigr).
\]
For sufficiently large \(\beta\) , there exists a steady state \(\mathcal N(m_{\infty}, C_{\infty})\) such that
\[
|m_{\infty}(\beta) - m_*| = \mathcal O(\beta^{-1}), \qquad |C_{\infty}(\beta) - C_*| = \mathcal O(\beta^{-1}),
\] where \(\mathcal N(m_*, C_*)\) the Laplace approximation of \(\e^{-f}\) .
For gaussian initial condition and target \(\mathcal N(a, A)\) , exponential convergence:
\[
|m_n - a| + |C_n - A|
= \mathcal O(\lambda^n), \qquad \lambda = \left( \frac{1 - \alpha}{1 + \beta} + \alpha \right)
\]
For gaussian initial condition and general target \(\e^{-f}\) , exponential convergence to Gaussian approximation: \[
|m_n - m_{\infty}(\beta)| + |C_n - C_{\infty}(\beta)|
= \mathcal O(\lambda^n), \qquad
\lambda = \alpha + (1-\alpha^2)\frac{k}{\beta}.
\]
Towards a library for consensus-based methods
Provide a unified interface to the different methods
In the documentation, present algorithmic description of the variants
Incorporate benchmarks to help comparison
Enhance reproducibility of research
Software versioning and sharing code
Use git
for software versioning
Use github
(or gitlab
, …) to collaborate
Add a licence (MIT, GPL…) to make it legally libre
Several codes already on github:
The Journal of Open Source Software seems a good venue for this type of work.
Software versioning and sharing code
\(~\)
Ingredients of a CBO code
Model code: the objective function to minimize
Solver code: where CBO enters
(optional) Visualization code: good to monitor, understand and illustrate
Main code : calls the other parts
Example model code: the Ackley function
module Ackley
dim, shift = 2 , 1
function f (𝐱)
𝐳 = 𝐱 .- 1
20 + exp (1 ) - 20 * exp (- .2 *sqrt (1 / dim * 𝐳'𝐳)) - exp(1/dim*sum(cos.(2π*𝐳)))
end
end
Solver code: a possible skeleton
module Cbo
struct Config
β:: Float64 # Inverse temperature
σ:: Float64 # Diffusion coefficient
Δ:: Float64 # Time step
end
function step! (f, config:: Config , ensemble)
# Main solver code comes here
end
end
We want the model and solver code to be decoupled
Modules in julia
help organize code into separate units
In julia
, the !
is used to signal that the function changes the arguments
Implementing the step!
function
function step! (f, ensemble, config:: Config )
# Define a few variables for convenience
d, J = size (ensemble)
β, σ, Δ = config.β, config.σ, config.Δ
Implementing the step!
function
function step! (f, ensemble, config:: Config )
# Define a few variables for convenience
d, J = size (ensemble)
β, σ, Δ = config.β, config.σ, config.Δ
# Calculate weighted mean
fensemble = f .(eachcol (ensemble))
fensemble = fensemble .- minimum (fensemble)
weights = exp .(- β* fensemble)
weights = weights / sum (weights)
mean = ensemble * weights
Implementing the step!
function
function step! (f, ensemble, config:: Config )
# Define a few variables for convenience
d, J = size (ensemble)
β, σ, Δ = config.β, config.σ, config.Δ
# Calculate weighted mean
fensemble = f .(eachcol (ensemble))
fensemble = fensemble .- minimum (fensemble)
weights = exp .(- β* fensemble)
weights = weights / sum (weights)
mean = ensemble * weights
Implementing the step!
function
function step! (f, ensemble, config:: Config )
# Define a few variables for convenience
d, J = size (ensemble)
β, σ, Δ = config.β, config.σ, config.Δ
# Calculate weighted mean
fensemble = f .(eachcol (ensemble))
fensemble = fensemble .- minimum (fensemble)
weights = exp .(- β* fensemble)
weights = weights / sum (weights)
mean = ensemble * weights
# Update positions with Euler—Maruyama
diff_mean = ensembles .- mean
ensembles .+= - Δ * diff_mean + σ*sqrt (2 Δ) * diff_mean .* randn (d, J)
end
It is not crucial to optimize this code if the bottleneck is f
!
Main code: putting things together
# Dimension and number of particles
d, J = 2 , 100
# Make the code a bit generic
Model, Solver = Ackley, Cbo
# Initial ensemble
ensemble = 3 randn (d, J)
# Solver config
β, σ, Δ = 100 , .3 , .1
config = Solver.Config (β, σ, Δ)
# Perform 40 iterations
for i ∈ 1 : 40
Solver.step! (Model.f, ensemble, config)
end
In practice, use sensible stopping criterion and
calculate the mean of final ensemble to obtain a point estimator.
Adding some visualization code
anim = @animate for i ∈ 1 : 40
Cbo.step! (Ackley.f, ensemble, config)
scatter (ensemble[1 ,: ], ensemble[2 ,: ], label=: none)
end
Plots.gif (anim, "cbo.gif" , fps= 4 )
With a bit more work…
Code
function plot_ackley (ensemble)
L = 3 ; grid = LinRange (- L, L, 100 )
heatmap (grid, grid, (x,y) -> Ackley.f ([x,y]), size= (900 , 900 ), colormap=: pastel, cbar=: none)
scatter! ([0 ], [0 ], color= "black" , markershape=: xcross,
markersize= 10 , label=: none, xlims= (- 3 ,3 ), ylims= (- 3 ,3 ))
scatter! (ensemble[1 ,: ], ensemble[2 ,: ], label=: none)
end
anim = @animate for i ∈ 1 : 40
Cbo.step! (Ackley.f, ensemble, config)
plot_ackley (ensemble)
end
Plots.gif (anim, "ackley.gif" , fps= 4 )
Improvement 1: evaluate f
in parallel
In many applications, such as
the computational bottleneck is evaluation of f
# Calculate weighted mean
fensemble = f .(eachcol (ensemble))
For computationally expensive applications, make this loop parallel:
using Base.Threads
fensemble = zeros (J)
@threads for j ∈ 1 : J
fensemble[j] = f (ensemble[: , j])
end
Improvement 2: organize in separate files
Files for models (Ackley, Rastrigin, …)
Files for solvers (CBO, PSO, …)
Test files
├── model_ackley.jl
├── model_rastrigin.jl
├── run_test_1.jl
├── run_test_2.jl
├── solver_cbo.jl
└── solver_cbs.jl
In test files, choose model and solver:
include ("model_ackley.jl" )
include ("solver_cbo.jl" )
Improvement 3: move iterations to the solver code
function minimize (f, ensemble, config:: Config ; tol= 1e-8 , maxiter= 1000 , savedir=: none)
ensemble = copy (ensemble)
for i ∈ 1 : maxiter
if norm (cov (ensemble, dims= 2 )) <= tol
@goto converged
end
step! (f, ensemble, config)
if savedir != : none
writedlm (" $ savedir/ $ i.txt" , ensemble)
end
end
error ("Did not converge!" )
@label converged
argmin = mean (ensemble, dims= 2 )[: ]
return argmin, f (argmin)
end
In run_test_1.jl
, we call this function
x, m = Cbo.minimize (Ackley.f, ensemble, config; savedir= "test_1" )
Improvement 4: add steepest descent at the end
In some cases, this can be achieved easily using automatic differentiation
using Zygote
function minimize (f, ensemble, config:: Config ; tol= 1e-8 , maxiter= 1000 , savedir=: none)
# ... (Previous code omitted)
γ = .001
while norm (f' (argmin)) >= tol
argmin -= γ * f' (argmin)
end
return argmin, f (argmin)
end
Works with almost any Julia function
function my_sqrt (x)
t = (1 + x) / 2
while abs (t* t - x) > 1e-10
t = (t + (x/ t)) / 2
end
return t
end
my_sqrt' (1 .)
Improvement 5: decoupling the visualization
Often, the solver and visualization codes run on different computers;
Often, we want to plot results before the end.
\(~\)
Example set-up
Improvement 5: decoupling the visualization
Decoupling can be achieved by saving to files:
In test code, instruct solver to save results in files;
Read results from files in visualization code.
savedir = "test_1"
anim = @animate for i in 1 : ∞
filename = " $ savedir/ $ i.txt"
if !isfile (filename)
break
end
ensemble = readdlm (filename)
scatter (ensemble[1 ,: ], ensemble[2 ,: ], label=: none,
xlims= (- 3 , 3 ), ylims= (- 3 , 3 ))
end
gif (anim, "ackley_cbo.gif" , fps= 4 )