Created: April 11, 2022
Modified: April 11, 2022

rl diagnostics

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Things that might be useful to log in a reinforcement learning algorithm:

Return of each trajectory. (summarize as mean/std/min/max)
Value estimate of the initial state (should converge to the average return)
Trajectory lengths.
temporal difference error at each step (summarize by mean TD error, and std/min/max. mean should converge to zero)
for policy gradient methods:
1. action space entropy
2. KL divergence in policy from one epoch to the next.

rl diagnostics

Meta