TY -非盟的陈Pei-Fu盟——他,Tai-Liang AU - Lin Sheng-Che盟——楚Yuan-Chia盟——郭Chen-Tsung AU -赖,Feipei AU - Wang Ssu-Ming盟——朱Wan-Xuan AU - Chen Kuan-Chih盟——郭Lu-Cheng盟——挂,方明AU -林,Yu-Cheng盟——蔡I-Chang盟——赵Chi-Hao AU - Chang Shu-Chih AU -杨,Chi-Yu PY - 2022 DA - 2022/11/10 TI -培训深更符合实际的语言模型对国际疾病分类,10日修订通过联合学习分类:模型开发和验证研究JO - JMIR Med Inform SP - e41342 VL -10 IS - 11 KW -联邦学习KW -国际疾病分类KW -机器学习KW -自然语言处理KW -多标签文本分类AB -背景:通过使用国际疾病分类,第十版(ICD-10)对临床文本文档进行自动编码,可用于统计分析和报销。随着自然语言处理模型的发展,新的具有注意机制的变压器体系结构优于以前的模型。虽然多中心训练可以提高模型的性能和外部有效性,但临床文件的隐私应该得到保护。我们使用联邦学习来训练一个多中心数据模型,而不是共享数据本身。目的:本研究旨在通过联邦学习训练分类模型,用于ICD-10多标签分类。方法:从远东纪念医院、台大医院、台北市退伍军人总医院三个医疗中心收集电子病历中出院记录的文本数据。在比较了来自变压器(BERT)的双向编码器表示的不同变体的性能后,选择PubMedBERT进行词嵌入。在预处理方面,由于去除非字母数字字符后模型性能下降,因此保留了非字母数字字符。为了解释模型的输出,我们在模型架构中添加了标签注意机制。 The model was trained with data from each of the three hospitals separately and via federated learning. The models trained via federated learning and the models trained with local data were compared on a testing set that was composed of data from the three hospitals. The micro F1 score was used to evaluate model performance across all 3 centers. Results: The F1 scores of PubMedBERT, RoBERTa (Robustly Optimized BERT Pretraining Approach), ClinicalBERT, and BioBERT (BERT for Biomedical Text Mining) were 0.735, 0.692, 0.711, and 0.721, respectively. The F1 score of the model that retained nonalphanumeric characters was 0.8120, whereas the F1 score after removing these characters was 0.7875—a decrease of 0.0245 (3.11%). The F1 scores on the testing set were 0.6142, 0.4472, 0.5353, and 0.2522 for the federated learning, Far Eastern Memorial Hospital, National Taiwan University Hospital, and Taipei Veterans General Hospital models, respectively. The explainable predictions were displayed with highlighted input words via the label attention architecture. Conclusions: Federated learning was used to train the ICD-10 classification model on multicenter clinical text while protecting data privacy. The model’s performance was better than that of models that were trained locally. SN - 2291-9694 UR - https://medinform.www.mybigtv.com/2022/11/e41342 UR - https://doi.org/10.2196/41342 UR - http://www.ncbi.nlm.nih.gov/pubmed/36355417 DO - 10.2196/41342 ID - info:doi/10.2196/41342 ER -
Baidu
map