Created: March 31, 2023
Modified: April 06, 2023

reward

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

stray thoughts about reward functions (probably related to the agent abstraction and the intentional stance)

one can make a distinction between how a system behaves and how its behaviors are reinforced (externally or internally). for a stationary system the latter is null and we'd talk about using tools like IRL to extract a 'reward function' implicit in its behavior. But for a nonstationary system it may be more natural to think about the implicit reward as the thing driving reinforcement, independent of the current behavior. Of course the nature of reinforcement is that you'd expect the behavior to trend towards the thing being reinforced, though not necessarily always converge to it.
- some work labels this distinction as utility != reward: reward is the thing the system is optimized for (at training time / in an outer loop), while utility is the thing a system is optimizing (at deployment time).

reward

Links to this note

AI reflections master

worldly objective

objectives are big

reward funnel

Meta