Created: March 29, 2022
Modified: March 29, 2022
Modified: March 29, 2022
eligibility trace
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.A few ways to think about eligibility traces:
- an explicit accounting of credit assignment
- a sufficient statistic for the history of the current trajectory
- a variant of momentum for reinforcement learning, where we update the value function using an exponentially decaying sum of recent gradients.
Concretely, the eligibility trace at time has the same shape as the weight vector , and the online TD() algorithm updates them as follows:
where is the temporal difference error in the estimated state value.In the tabular case, where is just the table of state values, the update simplifies to and respectively for all states .
We see that the eligibility trace vector is (by definition) an exponentially decaying sum of value function gradients, allowing us to update the value functions for all recently-observed states in a single update.