众所周知,卫生专业人员和卫生消费者之间的术语差距阻碍了消费者有效的信息搜索。目的:本研究的目的是通过评估统一医学语言系统(UMLS)在博客和社交问答(Q&A)两类社交媒体中糖尿病相关帖子的概念和语义类型的覆盖率,更好地了解消费者对医学概念的使用情况。方法:我们收集了两种类型的社交媒体数据:(1)2015年2月至10月期间在Tumblr上发布的3711个标记为“糖尿病”的博客;(2) 2009年至2014年间在雅虎糖尿病类别中发布的共计58,422个问题和相关答案。的答案。我们使用广泛采用的生物医学文本处理框架Apache cTAKES及其扩展YTEX分析数据集。首先,我们应用YTEX中实现的命名实体识别(NER)方法来识别数据集中的UMLS概念。然后,我们通过两个数据集(即博客和社交问答)分析了UMLS源词汇表中概念的覆盖率和受欢迎程度。此外,我们在SNOMED临床术语(SNOMED CT)和开放获取协同消费者健康词汇(OAC CHV)(在我们的数据集上覆盖最多的前2个UMLS源词汇)之间进行了概念级的比较覆盖分析。 We also analyzed the UMLS semantic types that were frequently observed in our datasets. Results: We identified 2415 UMLS concepts from blog postings, 6452 UMLS concepts from social Q&A questions, and 10,378 UMLS concepts from the answers. The medical concepts identified in the blogs can be covered by 56 source vocabularies in the UMLS, while those in questions and answers can be covered by 58 source vocabularies. SNOMED CT was the dominant vocabulary in terms of coverage across all the datasets, ranging from 84.9% to 95.9%. It was followed by OAC CHV (between 73.5% and 80.0%) and Metathesaurus Names (MTH) (between 55.7% and 73.5%). All of the social media datasets shared frequent semantic types such as “Amino Acid, Peptide, or Protein,” “Body Part, Organ, or Organ Component,” and “Disease or Syndrome.” Conclusions: Although the 3 social media datasets vary greatly in size, they exhibited similar conceptual coverage among UMLS source vocabularies and the identified concepts showed similar semantic type distributions. As such, concepts that are both frequently used by consumers and also found in professional vocabularies such as SNOMED CT can be suggested to OAC CHV to improve its coverage. SN - 2291-9694 UR - https://medinform.www.mybigtv.com/2016/4/e41/ UR - https://doi.org/10.2196/medinform.5748 UR - http://www.ncbi.nlm.nih.gov/pubmed/27884812 DO - 10.2196/medinform.5748 ID - info:doi/10.2196/medinform.5748 ER -
Baidu
map