临床研究中真实世界数据和证据的异常检测算法;卡塔尔世界杯8强波胆分析实施、评估和验证研究%A churov,Vendula %A Vyškovský,Roman %A Maršálová,Kateřina %A Kudláček,David %A Schwarz,Daniel %+捷克共和国布尔诺市波斯托夫斯卡3号生物统计与分析研究所,Ltd, 420 604996753, schwarz@biostatistika.cz %K临床研究数据%K真实证据%K注册数据库%K数据质量%K EDC系统%K异常检测%D 2021 %7 7.5.2021 %9原始论文%J JMIR Med Inform %G English %X统计分析已成为循证医学的重要组成部分,在现代临床研究中对数据质量的依赖是至关重要的。输入数据不仅存在伪造或捏造的风险,而且还存在调查人员处理不当的风险。目的:迫切需要确保尽可能高的数据质量,因此实施了各种审计策略,旨在监测临床试验并发现该领域经常发生的不同来源的错误。本研究的目的是描述一种基于机器学习的算法,用于检测由于粗心大意、系统错误或故意输入伪造值而产生的数据中的异常模式。方法:介绍了一种用于临床登记数据管理的电子数据采集(EDC)系统,包括其体系结构和数据结构。该EDC系统的特点是基于机器学习的算法,旨在检测定量数据中的异常模式。检测算法将聚类与一系列7个距离度量相结合,用于确定异常的强度。在检测过程中,使用了阈值和指标组合,并在涉及模拟异常数据和实际数据的实验中评估和验证了检测性能。 Results: Five different clinical registries related to neuroscience were presented—all of them running in the given EDC system. Two of the registries were selected for the evaluation experiments and served also to validate the detection performance on an independent data set. The best performing combination of the distance metrics was that of Canberra, Manhattan, and Mahalanobis, whereas Cosine and Chebyshev metrics had been excluded from further analysis due to the lowest performance when used as single distance metric–based classifiers. Conclusions: The experimental results demonstrate that the algorithm is universal in nature, and as such may be implemented in other EDC systems, and is capable of anomalous data detection with a sensitivity exceeding 85%. %M 33851576 %R 10.2196/27172 %U https://medinform.www.mybigtv.com/2021/5/e27172 %U https://doi.org/10.2196/27172 %U http://www.ncbi.nlm.nih.gov/pubmed/33851576
Baidu
map