I JMIR出版物从公共卫生部门的接触者追踪访谈表格中识别COVID-卡塔尔世界杯8强波胆分析19暴发:自然语言处理管道的开发% a Caskey,John % a McConnell,Iain L % a Oguss,Madeline % a Dligach,Dmitriy % a Kulikoff,Rachel % a Grogan,Brittany % a Gibson,Crystal % a Wimmer,Elizabeth % a DeSalvo,Traci E % a nyake - nyasani,Edwin E % a Churpek,Matthew M % a Afshar,Majid %+威斯康星大学麦迪逊分校,1685 Highland Avenue, 5158 Medical Foundation Centennial Building, WI, 53705,美国,1 3125459462,majid.afshar@wisc.edu %K自然语言处理%K公共卫生信息学%K命名实体识别%K接触者追踪%K COVID-19 %K疫情%K神经语言模型%K疾病监测%K数字卫生%K电子监测%K公共卫生%K数字监测工具%D 2022 %7 8.3.2022 %9原始论文%J JMIR公共卫生监测%G英语%X背景:在威斯康星州,COVID-19病例访谈表格包含自由文本字段,需要对这些字段进行挖掘,以确定潜在疫情,以便制定有针对性的政策。我们开发了一种自动管道,将免费文本吸收到预先训练的神经语言模型中,以将企业和设施识别为爆发。目的:我们的目标是检查我们的自然语言处理管道的准确性和召回率,以应对现有的爆发和潜在的新集群。方法:从2020年7月1日至2021年6月30日期间戴恩县的威斯康星州电子疾病监测系统(WEDSS)中提取COVID-19病例数据。来自案例访谈表单的特征被输入到来自变压器的双向编码器表示(BERT)模型中,该模型经过微调以用于命名实体识别(NER)。我们还开发了一个新的位置映射工具,为相关的NER提供地址。精确度和召回率是根据WEDSS中人工验证的爆发和有效地址来衡量的。 Results: There were 46,798 cases of COVID-19, with 4,183,273 total BERT tokens and 15,051 unique tokens. The recall and precision of the NER tool were 0.67 (95% CI 0.66-0.68) and 0.55 (95% CI 0.54-0.57), respectively. For the location-mapping tool, the recall and precision were 0.93 (95% CI 0.92-0.95) and 0.93 (95% CI 0.92-0.95), respectively. Across monthly intervals, the NER tool identified more potential clusters than were verified in WEDSS. Conclusions: We developed a novel pipeline of tools that identified existing outbreaks and novel clusters with associated addresses. Our pipeline ingests data from a statewide database and may be deployed to assist local health departments for targeted interventions. %M 35144241 %R 10.2196/36119 %U https://publichealth.www.mybigtv.com/2022/3/e36119 %U https://doi.org/10.2196/36119 %U http://www.ncbi.nlm.nih.gov/pubmed/35144241
Baidu
map