Created:
Modified:
Modified:
intrinsic motivation
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.objectives:
- maximize entropy of the state visitation distribution
- requires empirical estimates of entropy
- maximize mutual information between a random goal vector and the state distribution
- MI = entropy - conditional entropy, so you can view this as the previous maxent objective plus an extra term to minimize H[s | z] - we want to explore lots of states overall, but each individual goal should lead to narrow state distributions
- papers: variational intrinsic control (2016), ..., active pretraining with successor features (2021)
How to make intrinsic motivation safe? Suppose we're trying to maximize entropy of the state distribution. A few issues:
- The apparent entropy depends on the state representation: e.g., you can maximize entropy by learning a representation that focuses on noise. So on its own this doesn't seem like a great objective for representation learning.
- this incentivizes the agent to do all kinds of crazy shit, like a sociopathic kid who vivisects animals just to see what happens.
Is there work on this? Seems like an important area for AI safety.