Gravar-mail: Balancing Exploration and Exploitation in Self-imitation Learning