Gravar-mail: A normative account of confirmation bias during reinforcement learning