Created: July 23, 2021
Modified: April 23, 2022

reinforcement learning

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Note: see reinforcement learning notation for a guide to the notation I'm attempting to use through my RL notes.

Three paradigmatic approaches to reinforcement learning:

model a policy directly and use it to make decisions (policy gradient, etc.)
model values, and make decisions by choosing the highest expected values (q-learning, etc.)
model dynamics, and make decisions by planning ('model-based RL')

RL is important. transformers are already able to learn incredibly well from self-supervised objectives. But ultimately we need machines that can think for themselves and pursue goals. So we need to think about how to combine ML with the idea of agency.

things to build: implement some simple RL algorithms in JAX with Brax physics models.
things to build: implement MuZero in a game setting.
Relevant:

reinforcement learning

Links to this note

reward funnel

mesa optimizer

exposure bias

teacher forcing

research idea

fixed point

distributional RL

experience replay

maximum-entropy reinforcement learning

macrostate

rl diagnostics

off-policy

eligibility trace

policy gradient

state values, then action values

generative flow network

deep RL notes

style guide

capabilities research

hard attention

meta-level shape of machine learning

score function

dopamine

cooperative inverse reinforcement learning

Meta