Created: March 22, 2022
Modified: March 22, 2022

generalized policy iteration

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Sutton and Barto use this as a general term for any form of interleaving policy evaluation steps with policy improvement steps. This includes almost all RL algorithms.

generalized policy iteration

Links to this note

policy gradient

Meta