TY - JOUR AU - Fernandes, Marta AU - Sun, Haoqi AU - Jain, Aayushee AU - Alabsi, Haitham S AU - Brenner, Laura N AU - Ye, Elissa AU - Ge, Wendong AU - Collens, Sarah I AU - Leone, Michael J AU - Das, Sudeshna AU - Robbins, Gregory K AU - Mukerji, Shibani S AU - Westover, M Brandon PY - 2021 DA - 201/2/10 TI -新型冠状病毒肺炎住院患者的处理分类:使用自然语言处理阅读出院摘要JO - JMIR Med Inform SP - e25457 VL - 9 IS - 2kw - ICU KW -冠状病毒KW -电子健康记录KW -非结构化文本KW -自然语言处理KW - BoW KW - LASSO KW -特征选择KW -机器学习KW -重症监护病房KW - covid KW - EHR AB -背景:医疗笔记是丰富的患者数据来源;然而,非结构化文本的性质在很大程度上阻碍了这些数据用于大型回顾性分析。将临床文本转换为结构化数据可以使用电子健康记录(EHR)数据进行大规模研究。自然语言处理(NLP)可以用于文本信息检索,减少了对劳动密集型图表审查的需要。在这里,我们将NLP应用于对2家大型医院的COVID-19住院患者的医疗记录进行大规模分析。目的:我们的研究目的是建立一个NLP管道,根据出院总结记录对COVID-19住院患者的出院处置(家庭、住院康复、熟练护理住院设施[SNIF]和死亡)进行分类。方法:将文本挖掘和特征工程技术应用于医院出院摘要中的非结构化文本。该研究包括2020年3月10日至2020年6月30日期间从马萨诸塞州波士顿地区的两家医院(马萨诸塞州总医院和布里格姆妇女医院)出院的COVID-19患者。数据分为训练集(70%)和保留测试集(30%)。 Discharge summaries were represented as bags-of-words consisting of single words (unigrams), bigrams, and trigrams. The number of features was reduced during training by excluding n-grams that occurred in fewer than 10% of discharge summaries, and further reduced using least absolute shrinkage and selection operator (LASSO) regularization while training a multiclass logistic regression model. Model performance was evaluated using the hold-out test set. Results: The study cohort included 1737 adult patients (median age 61 [SD 18] years; 55% men; 45% White and 16% Black; 14% nonsurvivors and 61% discharged home). The model selected 179 from a vocabulary of 1056 engineered features, consisting of combinations of unigrams, bigrams, and trigrams. The top features contributing most to the classification by the model (for each outcome) were the following: “appointments specialty,” “home health,” and “home care” (home); “intubate” and “ARDS” (inpatient rehabilitation); “service” (SNIF); “brief assessment” and “covid” (death). The model achieved a micro-average area under the receiver operating characteristic curve value of 0.98 (95% CI 0.97-0.98) and average precision of 0.81 (95% CI 0.75-0.84) in the testing set for prediction of discharge disposition. Conclusions: A supervised learning–based NLP approach is able to classify the discharge disposition of patients hospitalized with COVID-19. This approach has the potential to accelerate and increase the scale of research on patients’ discharge disposition that is possible with EHR data. SN - 2291-9694 UR - https://medinform.www.mybigtv.com/2021/2/e25457 UR - https://doi.org/10.2196/25457 UR - http://www.ncbi.nlm.nih.gov/pubmed/33449908 DO - 10.2196/25457 ID - info:doi/10.2196/25457 ER -
Baidu
map