%0期刊文章%@ 1438-8871 %I JMIR出版物%V 19 卡塔尔世界杯8强波胆分析%N 10 %P 361% T从社交媒体中发现孕妇的安全监测和分析%A Sarker,Abeed %A Chandrashekar,Pramod %A Magge,Arjun %A Cai,Haitao %A Klein,Ari %A Gonzalez,Graciela %+宾夕法尼亚大学佩雷尔曼医学院生物统计学,流行病学和信息学系,423 Guardian Drive,费城,宾夕法尼亚州,19104,美国,1 6024746203,abeed@pennmedicine.upenn.edu %K自然语言处理%K机器学习%K文本挖掘%K社交媒体%K怀孕%K队列研究%K数据分析%D 2017 %7 30.10.2017 %9原始论文%J J医学互联网Res %G英文%X背景:妊娠暴露登记是关于怀孕期间母亲使用药物安全性的主要信息来源。这种登记在怀孕早期以自愿的方式登记孕妇,并跟踪她们直到怀孕结束或更长时间,以系统地收集有关具体妊娠结果的信息。虽然妊娠登记模式与其他研究设计相比具有明显的优势,但也面临着入围率低、成本高、选择偏倚等诸多挑战和限制。目的:本研究的主要目标是系统地评估社交媒体(Twitter)是否可以用于发现孕妇队列,并开发和部署自然语言处理和机器学习管道,用于自动收集队列信息。此外,我们还试图初步确定从收集的队列信息中可以挖掘出哪些类型的纵向信息。方法:我们发现孕妇依赖于检测怀孕提示推文(pit),这是孕妇发布的关于怀孕的声明。我们使用了一组14种模式来首先检测潜在的pit。我们手动标注了14,156个检索到的用户帖子样本,以区分真实的pit和假阳性,并训练了一个监督分类系统来检测真实的pit。我们通过交叉验证优化了分类系统,其特征和设置旨在优化阳性类别的精度。 For users identified to be posting real PITs via automatic classification, our pipeline collected all their available past and future posts from which other information (eg, medication usage and fetal outcomes) may be mined. Results: Our rule-based PIT detection approach retrieved over 200,000 posts over a period of 18 months. Manual annotation agreement for three annotators was very high at kappa (κ)=.79. On a blind test set, the implemented classifier obtained an overall F1 score of 0.84 (0.88 for the pregnancy class and 0.68 for the nonpregnancy class). Precision for the pregnancy class was 0.93, and recall was 0.84. Feature analysis showed that the combination of dense and sparse vectors for classification achieved optimal performance. Employing the trained classifier resulted in the identification of 71,954 users from the collected posts. Over 250 million posts were retrieved for these users, which provided a multitude of longitudinal information about them. Conclusions: Social media sources such as Twitter can be used to identify large cohorts of pregnant women and to gather longitudinal information via automated processing of their postings. Considering the many drawbacks and limitations of pregnancy registries, social media mining may provide beneficial complementary information. Although the cohort sizes identified over social media are large, future research will have to assess the completeness of the information available through them. %M 29084707 %R 10.2196/jmir.8164 %U //www.mybigtv.com/2017/10/e361/ %U https://doi.org/10.2196/jmir.8164 %U http://www.ncbi.nlm.nih.gov/pubmed/29084707
Baidu
map