This may be a central point of confusion: how do we define AI systems that have preferences about the real world , so that their goals and…
A very incomplete and maybe nonsensical intuition I want to explore. Classically, people talk about very simple [ reward ] functions like…
The standard [ Markov decision process ] formalism includes a reward function ; the total (discounted) reward across a trajectory is its…
Many objects can be generated by a sequence of actions. For example: Generating language by adding one word at a time Generating a molecule…
Suppose we have a [ Markov decision process ] in which we get reward only at the very end of a long trajectory. Until that point, we have no…