Self-Imitation Learning (SIL)
This paper proposes Self-Imitation Learning (SIL), a simple off-policy
actor-critic algorithm that learns to reproduce the agent's past good
decisions. This algorithm is designed to verify our hypothesis that exploiting
past good experiences can indirectly drive deep exploration. Our empirical
results show that SIL significantly improves advantage actor-critic (A2C) on
several hard exploration Atari games and is competitive to the state-of-the-art
count-based exploration methods. We also show that SIL improves proximal policy
optimization (PPO) on MuJoCo tasks.
Details >>
Comments
Post a Comment