We say that is a fixed point of an update rule if . Update rules can often (though not necessarily) be seen as defining an…
For any reward function and policy , consider the entropy-regularized reward Taking as our objective the (expected, discounted…