1.3817732906760363
Lorentz Center Workshop on Purpose-driven Particle Systems
MATHERIALS team, Inria Paris & Cermics, École des Ponts
\[ \def\e{{\mathrm e}} \def\d{{\mathrm d}} \def\real{{\mathbf R}} \def\expect{{\mathbf E}} \def\proba{{\mathbf P}} \]
In practice, one has to use a version of the methods that
is discrete in time,
uses a finite number of particles \(J\).
CBO | CBS | |
---|---|---|
Continuous-time and \(J = \infty\) | ✔️ | ✔️ |
Continuous-time and finite \(J\) | ✔️ | ❌ |
Discrete-time and \(J = \infty\) | ❌ | ✔️ |
Discrete-time and finite \(J\) | ✔️ | ❌ |
Consider the following version with element-wise noise: \[ \d \theta^{(j)}_t = - \Bigl(\theta^{(j)}_t- \mathcal M_{\beta}(\mu^J_t)\Bigr) \, \d t + \sqrt{2}\sigma \Bigl(\theta^{(j)}_t - \mathcal M_{\beta}(\mu^J_t)\Bigr) \odot \d W^{(j)}_t, \qquad j=1, \dots,J \] where \(\mathcal M_{\beta}(\mu^J_t)\) is given by \[ \mathcal M_{\beta}(\mu^J_t) = \frac {\int \theta \e^{-\beta f(\theta)} \mu^J_t(\d \theta)} {\int \e^{-\beta f(\theta)} \mu^J_t(\d \theta)}, \qquad \mu^J_t = \frac{1}{J} \sum_{j=1}^{J} \delta_{\theta^{(j)}_t}. \]
\(\beta\) is the “inverse temperature” in the reweighting.
\(\sigma\) is the diffusion coefficient, related to exploration;
The drift parameter is not required (time rescaling).
More info in Carrillo, Jin, Li, Zhu 2021
For the mean field equation \[ \d \theta_t = - \Bigl(\theta_t- \mathcal M_{\beta}(\mu_t)\Bigr) + \sqrt{2}\sigma \Bigl(\theta_t - \mathcal M_{\beta}(\mu_t)\Bigr) \odot \d W_t, \qquad \mu_t = {\rm Law}(\theta_t) \]
Under parameter constraints independent of \(d\),
\[\mathcal C(\rho_t) \to 0 \qquad \text{and} \qquad \mathcal M(\rho_t) \to \theta_{\infty}\]
\[ \begin{aligned} f(\theta_{\infty}) - f(\theta_*) &\leq \expect \e^{-\beta \bigl( f(\theta_0) - f(\theta_*) \bigr)} + \frac{\log 2}{\beta} \\ &\leq \frac{\color{red}{d}}{2} \frac{\log(\beta/(2\pi))}{\beta} + \frac{C_L}{\beta}. \end{aligned} \]
\[ \theta^{(j)}_{n+1} = \theta^{(j)}_{n} - \Bigl(\theta^{(j)}_{n} - \mathcal M_{\beta}(\mu^J_{n})\Bigr) \, \Delta + \sqrt{2 \Delta} \sigma \Bigl(\theta^{(j)}_n - \mathcal M_{\beta}(\mu^J_n)\Bigr) \odot \xi^{(j)}_n, \]
\[ \begin{aligned} \widehat \theta^{(j)}_{n} &= \mathcal M_{\beta}(\mu^J_{n}) + \e^{-\Delta} \left(\theta^{(j)}_n - \mathcal M_{\beta}(\mu^J_{n}) \right) \\ \theta^{(j)}_{n+1} &= \widehat \theta^{(j)}_{n} + \sqrt{2 \Delta} \sigma \Bigl(\widehat \theta^{(j)}_n - \mathcal M_{\beta}(\mu^J_n)\Bigr) \odot \xi^{(j)}_n \end{aligned} \]
\[ \theta^{(j)}_{n+1} = \mathcal M_{\beta}(\mu^J_{n}) + \sum_{\ell=1}^d \mathbf e_\ell \left( \theta^{(j)}_{n} - \mathcal M_{\beta}(\mu^J_{n}) \right)_\ell \exp \left( \left( - 1 - \sigma^2 \right) \Delta + \sqrt{2 \Delta}\sigma \xi^{(j)}_{n,i} \right) \]
Discretizations 2) and 3) were proposed in Carrillo, Jin, Li, Zhu 2021.
(Q1) Consensus: Is there a common consensus state \(\theta_{\infty} = \theta_{\infty}(\omega,\beta,\Delta)\) such that almost surely \[ \forall j \in \{1, \dotsc, J\}, \qquad \lim_{n \to \infty} \theta^{(j)}_n = \theta_{\infty} \]
(Q2) Accuracy: How close is \(\theta_{\infty}\) to \(\theta_*\), the global minimizer of \(f\).
Dependence on \(J\)?
Dependence on \(\beta\)?
Dependence on \(\Delta\)?
(Q3?) Mean field: studying \(J \to \infty\) at the discrete-time level may be useful.
In Ha, Jin, Kim, 2019, the case where \(W^{(1)} = \dots = W^{(J)}\) is investigated.
Theorem: consensus formation for the Euler-Maruyama scheme
If \(0 < \sigma^2 < 1\) and \(0 < \Delta < 2 - 2\sigma^2\), then
it holds for all \(1 \leq i,j \leq J\) that \[ \expect \left\lvert \theta_n^{(i)} - \theta_n^{(j)} \right\rvert^2 \leq \e^{- n \Delta (2 - \Delta - 2 \sigma^2)} \expect \left\lvert \theta_0^{(i)} - \theta_0^{(j)} \right\rvert^2 \]
it holds for all \(1 \leq i,j \leq J\) that, almost surely, \[ \left\lvert \theta_n^{(i)} - \theta_n^{(j)} \right\rvert \leq \e^{-n \Delta Y_n} \left\lvert \theta_n^{(i)} - \theta_n^{(j)} \right\rvert \] where \(Y_n\) is a random variable satisfying \[ \lim_{n \to \infty} Y_n(\omega) = \frac{1}{2} (2 - \Delta - 2 \sigma^2). \]
Later in Ha, Jin, Kim, 2021, this is generalized to
\[ \theta^{(j)}_{n+1} = \theta^{(j)}_{n} - {\color{red} \gamma} \Bigl(\theta^{(j)}_{n} - \mathcal M_{\beta}(\mu^J_{n})\Bigr) + \sum_{\ell=1}^d \mathbf e_\ell \Bigl(\theta^{(j)}_n - \mathcal M_{\beta}(\mu^J_n)\Bigr)_\ell \, {\color{red}{\eta_{n,\ell}}}, \] with \(\expect \left[{\color{red}{\eta_{n,i}}}\right] = 0\) and \(\expect \left[{\color{red}{\eta_{n,i}}}^2\right] = {\color{red}{\zeta^2}}\).
Theorem: consensus formation for generalized scheme
If \(r := (1 - \gamma)^2 + \zeta^2 < 1\), then
it holds for all \(1 \leq i,j \leq J\) that \[ \expect \left\lvert \theta_n^{(i)} - \theta_n^{(j)} \right\rvert^2 = \mathcal O (r^n) \]
there exists \(\theta_{\infty}\) such that almost surely \[ \forall j \in \{1, \dotsc, J\}, \qquad \lim_{n \to \infty} \theta^{(j)}_n = \theta_{\infty}. \]
The result is further generalized to heterogeneous noises in Ko, Ha, Jin, Kim, 2022
In Ha, Jin, Kim, 2021, the authors also show that, under appropriate conditions, \[ {\rm ess\,inf}_{\omega \in \Omega} f\bigl(\theta_{\infty}(\omega)\bigr) - f(\theta_*) \leq - \frac{1}{\beta} \log \expect \e^{-\beta \bigl( f(\theta_0^{(1)}) - f(\theta_*) \bigr)} + \mathcal O(\beta^{-1}) \] This implies, for well-prepared initial data, \[ {\rm ess\,inf}_{\omega \in \Omega} f\bigl(\theta_{\infty}(\omega)\bigr) - f(\theta_*) \leq \frac{d}{2} \frac{\log \beta}{\beta} + \mathcal O(\beta^{-1}) \] where \(\theta_* = {\rm arg\,min} f\).
This is an estimate for the best-case scenario.
No guarantee that \(\proba \bigl[ |f(\theta_{\infty}) - f(\theta_*)| \geq \varepsilon \bigr] \to 0\) as \((\beta,J) \to (\infty,\infty)\).
This was proved only for homogeneous noise.
Summary of known results for the discrete scheme with \(J < \infty\):
Consensus | Accuracy | |
---|---|---|
Same noise | ✔️ | ✔️ |
Independent noises | ✔️ | ❌ |
Other open questions?
Influence of discretization method on performance;
Choice of time step:
\(\Delta\) | 0.01 | 0.1 | 0.2 | 0.5 | 1 |
---|---|---|---|---|---|
Success rate | 100% | 98% | 92% | 52% | 33% |
Number of iterations | 1133.16 | 111.23 | 54.46 | 20.44 | 12.67 |
Consider the following mean field sampling method to sample from \(\e^{-f}\): \[ \d \theta_t = - \Bigl(\theta_t - \mathcal M_{\beta}(\mu_t)\Bigr) \, \d t + \sqrt{2 (1+\beta) \mathcal C_{\beta}(\mu_t)} \, \d W_t, \qquad \mu_t = {\rm Law}(\theta_t) \] Discretization by freezing the nonlocal terms over each time step: \[ \theta_{n+1} = \mathcal M_{\beta}(\mu_n) + \e^{-\Delta} \Bigl(\theta_n - \mathcal M_{\beta}(\mu_n)\Bigr) + \sqrt{(1 - \e^{-2\Delta})(1+\beta) \mathcal C_{\beta}(\mu_n)} \xi_n. \]
Same steady states as the continuous equation, unconditionally stable
We can let \(\alpha = \e^{-\Delta}\) to simplify the notation.
\[ \color{blue}{\mu_{\infty}} = \mathcal N \bigl( \mathcal M_{\beta}(\color{blue}{\mu_{\infty}}), (1 + \beta) \mathcal C_{\beta}(\color{blue}{\mu_{\infty}})\bigr). \]
For sufficiently large \(\beta\), there exists a steady state \(\mathcal N(m_{\infty}, C_{\infty})\) such that
\[ |m_{\infty}(\beta) - m_*| = \mathcal O(\beta^{-1}), \qquad |C_{\infty}(\beta) - C_*| = \mathcal O(\beta^{-1}), \] where \(\mathcal N(m_*, C_*)\) the Laplace approximation of \(\e^{-f}\).
For gaussian initial condition and target \(\mathcal N(a, A)\), exponential convergence:
\[ |m_n - a| + |C_n - A| = \mathcal O(\lambda^n), \qquad \lambda = \left( \frac{1 - \alpha}{1 + \beta} + \alpha \right) \]
Python | Matlab | Julia | C | |
---|---|---|---|---|
Libre? | ✔️ | ❌ | ✔️ | ✔️ |
Fast? | ❔ | ❔ | ✔️ | ✔️ |
Easy? | ✔️ | ✔️ | ✔️ | ❌ |
Math-friendly? | ❔ | ✔️ | ✔️ | ❌ |
Ecosystem? | ✔️ | ❔ | ❔ | ✔️ |
Julia
is easy to learn, read and maintain
Julia
is fast: “As simple as Python with the speed of C++”
Julia
is well-suited for numerical mathematics
Julia
is libre software
Created in 2009 by J. Bezanson, S. Karpinski, V. B. Shah, and A. Edelman
Excerpt from the Julia Manifesto:
We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.
for
loops allowed!):In comparison, Python takes about 10 seconds…
Wikipedia definition: paradigm in which a function or method can be dynamically dispatched based on the run-time type.
import Base.-
function -(s1::String, s2::String)
return s1[1:length(s1)-length(s2)]
end
"julia" - "ia"
"jul"
abstract type Shape end
struct Rock <: Shape end
struct Paper <: Shape end
struct Scissors <: Shape end
play(::Type{Paper}, ::Type{Rock}) = "Paper wins"
play(::Type{Paper}, ::Type{Scissors}) = "Scissors wins"
play(::Type{Rock}, ::Type{Scissors}) = "Rock wins"
play(::Type{T}, ::Type{T}) where {T<: Shape} = "Tie, try again"
play(a::Type{<:Shape}, b::Type{<:Shape}) = play(b, a) # Commutativity
play(Paper, Rock)
"Paper wins"
Adding a shape is simple:
More details here