TY -的盟Kolluri Nikhil盟——刘,豫农非盟-没吃,Dhiraj PY - 2022 DA - 2022/8/25 TI - COVID-19错误检测:Machine-Learned解决Infodemic乔- JMIR Infodemiology SP - e38756六世- 2 - 2 KW - COVID-19 KW -错误KW -机器学习KW -核实KW - Infodemiology KW - Infodemic管理KW -模型性能KW -模型准确性KW -内容分析AB -背景:与covid -19相关的错误信息的数量早已超过了事实核查员可用来有效减轻其不良影响的资源。自动化和基于网络的方法可以有效遏制网上的错误信息。基于机器学习的方法已经在文本分类任务中取得了稳健的性能,包括可能质量较低的新闻可信度评估。尽管初步、快速的干预措施取得了进展,但与covid -19相关的错误信息的严重性继续让事实核查员不知所措。因此,迫切需要改进信息疫情应对的自动化和机器学习方法。目的:本研究的目的是改进信息疫情应对的自动化和机器学习方法。方法:我们评估了三种训练机器学习模型的策略,以确定模型的最高性能:(1)仅使用与COVID-19相关的事实核查数据,(2)仅使用一般事实核查数据,以及(3)将COVID-19和一般事实核查数据结合起来。我们从事实核查的“错误”内容结合编程检索的“真实”内容创建了两个与covid -19相关的错误信息数据集。第一组包含2020年7月至8月的约7000个条目,第二组包含2020年1月至2022年6月的约31000个条目。 We crowdsourced 31,441 votes to human label the first data set. Results: The models achieved an accuracy of 96.55% and 94.56% on the first and second external validation data set, respectively. Our best-performing model was developed using COVID-19–specific content. We were able to successfully develop combined models that outperformed human votes of misinformation. Specifically, when we blended our model predictions with human votes, the highest accuracy we achieved on the first external validation data set was 99.1%. When we considered outputs where the machine-learning model agreed with human votes, we achieved accuracies up to 98.59% on the first validation data set. This outperformed human votes alone with an accuracy of only 73%. Conclusions: External validation accuracies of 96.55% and 94.56% are evidence that machine learning can produce superior results for the difficult task of classifying the veracity of COVID-19 content. Pretrained language models performed best when fine-tuned on a topic-specific data set, while other models achieved their best accuracy when fine-tuned on a combination of topic-specific and general-topic data sets. Crucially, our study found that blended models, trained/fine-tuned on general-topic content with crowdsourced data, improved our models’ accuracies up to 99.7%. The successful use of crowdsourced data can increase the accuracy of models in situations when expert-labeled data are scarce. The 98.59% accuracy on a “high-confidence” subsection comprised of machine-learned and human labels suggests that crowdsourced votes can optimize machine-learned labels to improve accuracy above human-only levels. These results support the utility of supervised machine learning to deter and combat future health-related disinformation. SN - 2564-1891 UR - https://infodemiology.www.mybigtv.com/2022/2/e38756 UR - https://doi.org/10.2196/38756 DO - 10.2196/38756 ID - info:doi/10.2196/38756 ER -
Baidu
map