一名临床医生是你所需要的一切-心脏磁共振成像测量提取:卡塔尔世界杯8强波胆分析深度学习算法开发%A Singh,Pulkit %A Haimovich,Julian %A Reeder,Christopher %A Khurshid,Shaan %A Lau,Emily %A Cunningham,Jonathan W %A Philippakis,Anthony %A Anderson,Christopher D %A Ho,Jennifer E %A Lubitz,Steven A %A Batra,Puneet +数据科学平台,Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, MA, 02142,美国,1617 714 7000,gpbatra@gmail.com %K自然语言处理%K变压器%K机器学习%K心脏MRI %K临床结果%K深度学习%D 2022 %7 16.9.2022 %9背景:心脏磁共振成像(CMR)是一种强大的诊断方式,可以提供心脏解剖和功能的详细定量评估。临床报告通常以非结构化文本形式存储在电子健康记录系统中,从临床报告中自动提取CMR测量值将有助于其在研究中的使用。现有的机器学习方法要么依赖于大量的专家注释,要么需要开发耗时且特定于开发环境的工程规则。目的:我们假设使用预训练的基于转换器的语言模型可以从临床文本中实现标签高效的数字提取,而不需要启发式或大量的专家注释。在这里,我们在少量CMR注释上微调预训练的基于变压器的语言模型,以提取21个CMR测量值。我们评估了临床预训练的效果,以减少标记需求,并探索了数字输入的替代表示以提高性能。方法:我们的研究样本包括在多机构卫生保健系统中接受纵向心脏病学护理的99,252例患者。来自9280名患者的12720份可用CMR报告。 We adapted PRAnCER (Platform Enabling Rapid Annotation for Clinical Entity Recognition), an annotation tool for clinical text, to collect annotations from a study clinician on 370 reports. We experimented with 5 different representations of numerical quantities and several model weight initializations. We evaluated extraction performance using macroaveraged F1-scores across the measurements of interest. We applied the best-performing model to extract measurements from the remaining CMR reports in the study sample and evaluated established associations between selected extracted measures with clinical outcomes to demonstrate validity. Results: All combinations of weight initializations and numerical representations obtained excellent performance on the gold-standard test set, suggesting that transformer models fine-tuned on a small set of annotations can effectively extract numerical quantities. Our results further indicate that custom numerical representations did not appear to have a significant impact on extraction performance. The best-performing model achieved a macroaveraged F1-score of 0.957 across the evaluated CMR measurements (range 0.92 for the lowest-performing measure of left atrial anterior-posterior dimension to 1.0 for the highest-performing measures of left ventricular end systolic volume index and left ventricular end systolic diameter). Application of the best-performing model to the study cohort yielded 136,407 measurements from all available reports in the study sample. We observed expected associations between extracted left ventricular mass index, left ventricular ejection fraction, and right ventricular ejection fraction with clinical outcomes like atrial fibrillation, heart failure, and mortality. Conclusions: This study demonstrated that a domain-agnostic pretrained transformer model is able to effectively extract quantitative clinical measurements from diagnostic reports with a relatively small number of gold-standard annotations. The proposed workflow may serve as a roadmap for other quantitative entity extraction. %M 35960155 %R 10.2196/38178 %U https://medinform.www.mybigtv.com/2022/9/e38178 %U https://doi.org/10.2196/38178 %U http://www.ncbi.nlm.nih.gov/pubmed/35960155
Baidu
map