The standard [ Markov decision process ] formalism includes a reward function ; the total (discounted) reward across a trajectory is its…
Different experimental conditions may give rise to different outcomes . For example, let the variable indicate whether a person is…