Modified: July 04, 2022
differentiable environments
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.Maybe a stupid idea, but I wonder if the idea behind differentiable physics simulators (like Brax) can be extended more broadly to rich differentiable environments that include other agents taking their own actions. These agents would themselves be differentiable (they're just deep nets) so in principle you can ask for derivatives of future quantities in the world wrt your current action, differentiating through the other agents, and so immediately take into account the long term strategic effects of your actions.
Maybe this would let us train RL foundation models much more efficiently by giving us a setting in which credit assignment is much easier than in the real world. Humans can't backpropagate through our environments - how much better would it be if we could?