Gravar-mail: Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes