Gravar-mail: Dopamine reward prediction errors reflect hidden state inference across time