Created: March 22, 2022
Modified: March 22, 2022
Modified: March 22, 2022
generalized policy iteration
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.Sutton and Barto use this as a general term for any form of interleaving policy evaluation steps with policy improvement steps. This includes almost all RL algorithms.