@Article{info:doi/10.2196/28749,作者=“Shakeri Hossein Abad, Zahra and Butler, Gregory P and Thompson, Wendy and Lee, Joon”,标题=“公共卫生监测中的机器学习众包:从亚马逊土耳其机械中学到的经验教训”,期刊=“J Med Internet Res”,年=“2022”,月=“Jan”,日=“18”,卷=“24”,数=“1”,页=“e28749”,关键词=“众包;机器学习;数字公共卫生监测;公共卫生数据库;背景:众包服务,如Amazon Mechanical Turk (AMT),允许研究人员利用广泛的网络用户的集体智慧来完成劳动密集型任务。由于数据量大、周转时间短,很难对收集结果的质量进行人工验证,因此,关于这些资源用于开发数字公共卫生系统的可靠性,仍有许多问题有待探索。目的:本研究旨在探索和评估大众外包和AMT在开发数字公共卫生监测系统中的应用。方法:我们收集了98,722条推文的296,166个群体生成标签,由610名AMT工作人员标记,以开发机器学习(ML)模型,用于检测推特用户中与身体活动、久坐行为和睡眠质量相关的行为。为了推断基础真值标签并探索这些标签的质量,我们研究了4种统计共识方法,这些方法与任务特征无关,只关注工人标记行为。此外,为了建模与每个标记任务相关的元信息,并利用上下文敏感数据在真相推断过程中的潜力,我们开发了7个ML模型,包括传统分类器(离线和主动)、基于深度学习的分类模型和混合卷积神经网络模型。 Results: Although most crowdsourcing-based studies in public health have often equated majority vote with quality, the results of our study using a truth set of 9000 manually labeled tweets showed that consensus-based inference models mask underlying uncertainty in data and overlook the importance of task meta-information. Our evaluations across 3 physical activity, sedentary behavior, and sleep quality data sets showed that truth inference is a context-sensitive process, and none of the methods studied in this paper were consistently superior to others in predicting the truth label. We also found that the performance of the ML models trained on crowd-labeled data was sensitive to the quality of these labels, and poor-quality labels led to incorrect assessment of these models. Finally, we have provided a set of practical recommendations to improve the quality and reliability of crowdsourced data. Conclusions: Our findings indicate the importance of the quality of crowd-generated labels in developing ML models designed for decision-making purposes, such as public health surveillance decisions. A combination of inference models outlined and analyzed in this study could be used to quantitatively measure and improve the quality of crowd-generated labels for training ML models. ", issn="1438-8871", doi="10.2196/28749", url="//www.mybigtv.com/2022/1/e28749", url="https://doi.org/10.2196/28749", url="http://www.ncbi.nlm.nih.gov/pubmed/35040794" }