reinforcement learning: Nonlinear Function
Created: July 23, 2021
Modified: April 23, 2022

reinforcement learning

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Note: see reinforcement learning notation for a guide to the notation I'm attempting to use through my RL notes.

Three paradigmatic approaches to reinforcement learning:

  • model a policy directly and use it to make decisions (policy gradient, etc.)
  • model values, and make decisions by choosing the highest expected values (q-learning, etc.)
  • model dynamics, and make decisions by planning ('model-based RL')

RL is important. transformers are already able to learn incredibly well from self-supervised objectives. But ultimately we need machines that can think for themselves and pursue goals. So we need to think about how to combine ML with the idea of agency.