Created: July 23, 2021
Modified: April 23, 2022
Modified: April 23, 2022
reinforcement learning
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.Note: see reinforcement learning notation for a guide to the notation I'm attempting to use through my RL notes.
Three paradigmatic approaches to reinforcement learning:
- model a policy directly and use it to make decisions (policy gradient, etc.)
- model values, and make decisions by choosing the highest expected values (q-learning, etc.)
- model dynamics, and make decisions by planning ('model-based RL')
RL is important. transformers are already able to learn incredibly well from self-supervised objectives. But ultimately we need machines that can think for themselves and pursue goals. So we need to think about how to combine ML with the idea of agency.
things to build: implement some simple RL algorithms in JAX with Brax physics models.
things to build: implement MuZero in a game setting.
Relevant: