TY - JOUR AU - Li, AU -邓士成,AU - Zhang李宗,Xu AU - Chen, AU - Yang陆明,Tao AU - Qi, AU - Jiang一帆,Taijiao PY - 2022 DA - 2022/6/3 TI -基于序列Motif发现工具识别表型叙述语言模式的中国电子健康记录深度表型:算法开发与验证JO - J Med Internet Res SP - e37213 VL - 24 IS - 6kw -深度表型KW -中文EHRs KW -语言模式KW - motif发现KW -模式识别AB -背景:电子健康档案(electronic health records, EHRs)中的表型信息主要以非结构化的自由文本记录,不能直接用于临床研究。基于电子病历的深度表型分型方法能够以较高的保真度构建电子病历中的表型信息,成为医学信息学研究的热点。然而,开发一种针对非英语电子病历(即中文电子病历)的深度表型分型方法具有挑战性。虽然中国存在大量的EHR资源,但适合开发深度表型方法的细粒度注释数据有限。在如此低资源的情况下,开发中国电子病历的深度表型分型方法具有挑战性。目的:本研究旨在基于有限的细粒度标注数据,开发一种具有良好泛化能力的中文电子病历深度表型分型方法。方法:该方法的核心是利用序列基序发现工具识别中文电子病历表型描述的语言模式,并通过识别自由文本中的语言模式对中文电子病历进行深度表型分型。具体而言,基于细粒度信息模型PhenoSSU (Semantic Structured Unit of Phenotypes)对1000份中文电子病历进行人工标注。标注数据集随机分为训练集(n= 700,70%)和测试集(n= 300,30%)。 The process for mining linguistic patterns was divided into three steps. First, free text in the training set was encoded as single-letter sequences (P: phenotype, A: attribute). Second, a biological sequence analysis tool—MEME (Multiple Expectation Maximums for Motif Elicitation)—was used to identify motifs in the single-letter sequences. Finally, the identified motifs were reduced to a series of regular expressions representing linguistic patterns of PhenoSSU instances in Chinese EHRs. Based on the discovered linguistic patterns, we developed a deep-phenotyping method for Chinese EHRs, including a deep learning–based method for named entity recognition and a pattern recognition–based method for attribute prediction. Results: In total, 51 sequence motifs with statistical significance were mined from 700 Chinese EHRs in the training set and were combined into six regular expressions. It was found that these six regular expressions could be learned from a mean of 134 (SD 9.7) annotated EHRs in the training set. The deep-phenotyping algorithm for Chinese EHRs could recognize PhenoSSU instances with an overall accuracy of 0.844 on the test set. For the subtask of entity recognition, the algorithm achieved an F1 score of 0.898 with the Bidirectional Encoder Representations from Transformers–bidirectional long short-term memory and conditional random field model; for the subtask of attribute prediction, the algorithm achieved a weighted accuracy of 0.940 with the linguistic pattern–based method. Conclusions: We developed a simple but effective strategy to perform deep phenotyping of Chinese EHRs with limited fine-grained annotation data. Our work will promote the second use of Chinese EHRs and give inspiration to other non–English-speaking countries. SN - 1438-8871 UR - //www.mybigtv.com/2022/6/e37213 UR - https://doi.org/10.2196/37213 UR - http://www.ncbi.nlm.nih.gov/pubmed/35657661 DO - 10.2196/37213 ID - info:doi/10.2196/37213 ER -
Baidu
map