A general issue with [ temporal difference ] learning methods, which 'update a guess towards a guess', is that they can end up 'chasing…