Email Record: UMLS-based data augmentation for natural language processing of clinical research literature