particle MCMC: Nonlinear Function
Created: April 06, 2020
Modified: April 06, 2020

particle MCMC

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.
  • Basic notes from https://www.stats.ox.ac.uk/~doucet/andrieu_doucet_holenstein_PMCMC.pdf
  • Setup: we have parameters θ\theta and time series model pθ(x0)pθ(y0xt)t=1Tpθ(xtxt1)pθ(ytxt)p_\theta(x_0) p_\theta(y_0 | x_t) \prod_{t=1}^T p_\theta(x_{t} | x_{t-1})p_\theta(y_t | x_t). We want to sample from the posterior p(θy)p(\theta | y).
  • The first thing we can do is 'Particle Independent MH'. This samples from p(xy)p(x | y); it does not infer θ\theta. The algorithm is:
    • propose pθ(xy)p_\theta(x | y) from the particle filter
    • then do an MH step to ensure we're targeting the right distribution (the particle filter does also target the posterior, but in practice it may be biased, e.g., if it has a bad proposal). For this MH step we use the 'likelihood' from the particle filter.
  • I think Particle Independent MH exists mostly as a simple case to structure the proofs---the first step is to show the PIMH works even though the likelihood is wrong. It doesn't solve the parameter-estimation problem, so we don't care.
  • The next algorithm is Particle Marginal MH. Here we just do ordinary MH on θ\theta, with whatever method we want, but instead of the true marginal likelihood pθ(y)p_\theta(y) we use the approximate p^θ(y)\hat{p}_\theta(y) from the particle filter. This seems pretty great. If we could even partially differentiate the PF likelihood, we could presumably use HMC (again, we're allowed whatever MH proposal we want; we don't need the gradients to be unbiased).
  • Finally we have particle Gibbs. The idea here is, sometimes it's easier to propose q(θx,y,θ)q(\theta' | x, y, \theta), conditioning on the imputed latent state, then it would be to propose q(θθ,y)q(\theta' | \theta, y) marginally, as we have to in PMMH. So we'd like to do Gibbs sampling: we iterate between imputing θ\theta and xx.
    • We can't just run SMC as the Gibbs sampler for q(xy,θ)q(x | y, \theta) because, as discussed above, although it targets the posterior it doesn't actually sample from the posterior for any finite number of particles. For example, imagine we have a single particle, Nx=1N_x = 1 and our proposal at the first step is the prior. Then the mean location of the particle at the first step is the prior mean, not the posterior: once we condition on all the data, we might downweight that particle but we're still stuck with it; nothing will change the prior value we originally sampled.
    • It seems like we could run Metropolis-within-Gibbs, following the PIMH outline above. But I guess we'd rather not have to worry about rejection.
    • Instead, we can run Gibbs sampling on an augmented space. We add an auxiliary variable τ\tau that picks out the index of a specific particle. Then we construct Gibbs moves for p(x1:T(τ)x1:Tτ̸)p(x^{(\tau)}_{1:T} | x^{\not \tau}_{1:T}) and p(x1:Tτ̸x1:T(τ),y,θ)p(x^{\not \tau}_{1:T} | x^{(\tau)}_{1:T}, y, \theta), along with the usual p(θx,y,θ)p(\theta' | x, y, \theta). The first move is really just defined by p(τx1:Tτ̸))=p(τ)=Uniform(0,Nx)p(\tau | x^{\not \tau}_{1:T})) = p(\tau) = \text{Uniform}(0, N_x)---of all the trajectories we sampled, we pick one uniformly at random. The second move is 'Conditional SMC': in which the trajectory of the τ\tau'th particle is fixed to some value, and every other particle is updated in the usual way.
  • Naively, a PMCMC algorithm wastes a lot of samples of xx: it runs NxN_x particles for each MCMC sample. It turns out it's valid to use all the particles to compute expectations wrt X.
  • Tentative conclusion: for SIR purposes, PMMH seems like the easiest to implement and probably the most sensible.