Created: February 20, 2022
Modified: February 21, 2022
Modified: February 21, 2022
rl goals
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.Implement MuZero or something similar.
What are the 'state of the art' RL algorithms?
What is known and not known about value alignment?