Created: September 07, 2020
Modified: September 07, 2020

mirror descent implementations

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

What pieces of mirror descent can we automate?
- See also natural gradient implementations
Given a mirror function $\Phi$ , we can compute the mirror map $\nabla \Phi$ , and the Bregman divergence $D_\Phi(x, y)$ .
Can we compute the convex conjugate $\Phi^*(y) = \sup_x \langle y, x \rangle - \Phi(x)$ ? Of course you could optimize, but I don't think we could get an exact version.
Can we do Bregman projection $\text{argmin}_{x\in X} D_\Phi(x, y)$ ? Again, of course we can optimize, but we're not going to get this in closed form for arbitrary $X$ and $\Phi$ .
How specifically does exponential-family mirror descent work?
- Hypothetically we start with a mean param $\mu_t$ and dual $\lambda_t$ . We compute $g_t = \nabla_\mu L(\lambda(\mu))$ and then apply the dual update $\lambda_{t+1} = \lambda_t + \alpha g_t$ . We then convert this back to $\begin{aligned}\mu_{t+1}&= \text{argmin}_{\mu\in \mathcal{M}} D_{A^*} (\mu, \nabla A(\lambda_{t+1}))\\ &= \text{argmin}_{\mu\in \mathcal{M}} A^*(\mu) - \langle \mu, \nabla A^*(\nabla A(\lambda_{t+1}))\rangle\\ &=\text{argmin}_{\mu\in \mathcal{M}} A^*(\mu) - \langle \mu, \lambda_{t+1} \rangle\end{aligned}$ , minimizing the Bregman divergence to the desired natural parameter $\lambda_{t+1}$ within the space of realizeable marginals.
- It seems like this assumes that our natural param $\lambda{t+1}$ was 'valid'? But in general we can get a natural param that doesn't normalize. Is that okay? It seems like the update above would still be well defined.

mirror descent implementations

Links to this note

natural gradient implementations

Meta