Gravar-mail: Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail