Erregistroa posta elektronikoz bidali: Statistical inference for natural language processing algorithms when predicting type 2 diabetes using electronic health record notes