%0期刊文章%@ 2291- 9694% I JMIR出版物%V 10卡塔尔世界杯8强波胆分析% N 11% P e36711% T连接生物医学数据仓库记录与法国国家死亡率数据库:大规模匹配算法%A Guardiolle,Vianney %A Bazoge,Adrien %A Morin,Emmanuel %A Daille,Béatrice %A Toublant,Delphine %A Bouzillé,Guillaume %A Merel,Youenn %A Pierre-Jean,Morgane %A Filiot,Alexandre %A Cuggia,Marc %A Wargny,Matthieu %A Lamer,Antoine %A Gourraud,Pierre-Antoine %+里尔大学,CHU Lille, ULR 2694, METRICS:Évaluation des Technologies de santé et des Pratiques médicales, F-59000, 1 place de Verdun,里尔,59000,法国,33 320626969,antoine.lamer@univ-lille.fr %K数据仓库%K临床数据仓库%K医疗信息学应用%K病历链接%K法国国家死亡率数据库%K数据重用%K开放数据,R %K临床信息学%D 2022 %7 1.11.2022 %9原始论文%J JMIR Med Inform %G英文%X背景:生物医学数据仓库(BDW)经常丢失或不确定,出院后的重要状态是BDW在医学研究中的核心价值。法国国家死亡率数据库(FNMD)提供每个死亡的公开提名记录。将大规模BDWs记录与FNMD进行匹配面临多个挑战:两个数据库之间缺乏唯一的公共标识符、名称随寿命变化、书写错误以及要计算的比较数量呈指数级增长。目的:开发一种新的BDW记录与FNMD匹配算法,并对其性能进行评估。方法:我们开发了一种基于高级数据清洗、命名系统知识和Damerau-Levenshtein距离(DLD)的确定性算法。使用里尔、南特和雷恩三所大学医院的BDW数据独立评估了算法的性能。对2016年1月1日在世患者(即在此日期之前和之后至少有1次医院就诊的患者)进行特异性评估。 Sensitivity was evaluated with patients recorded as deceased between January 1, 2001, and December 31, 2020. The DLD-based algorithm was compared to a direct matching algorithm with minimal data cleaning as a reference. Results: All centers combined, sensitivity was 11% higher for the DLD-based algorithm (93.3%, 95% CI 92.8-93.9) than for the direct algorithm (82.7%, 95% CI 81.8-83.6; P<.001). Sensitivity was superior for men at 2 centers (Nantes: 87%, 95% CI 85.1-89 vs 83.6%, 95% CI 81.4-85.8; P=.006; Rennes: 98.6%, 95% CI 98.1-99.2 vs 96%, 95% CI 94.9-97.1; P<.001) and for patients born in France at all centers (Nantes: 85.8%, 95% CI 84.3-87.3 vs 74.9%, 95% CI 72.8-77.0; P<.001). The DLD-based algorithm revealed significant differences in sensitivity among centers (Nantes, 85.3% vs Lille and Rennes, 97.3%, P<.001). Specificity was >98% in all subgroups. Our algorithm matched tens of millions of death records from BDWs, with parallel computing capabilities and low RAM requirements. We used the Inseehop open-source R script for this measurement. Conclusions: Overall, sensitivity/recall was 11% higher using the DLD-based algorithm than that using the direct algorithm. This shows the importance of advanced data cleaning and knowledge of a naming system through DLD use. Statistically significant differences in sensitivity between groups could be found and must be considered when performing an analysis to avoid differential biases. Our algorithm, originally conceived for linking a BDW with the FNMD, can be used to match any large-scale databases. While matching operations using names are considered sensitive computational operations, the Inseehop package released here is easy to run on premises, thereby facilitating compliance with cybersecurity local framework. The use of an advanced deterministic matching algorithm such as the DLD-based algorithm is an insightful example of combining open-source external data to improve the usage value of BDWs. %M 36318244 %R 10.2196/36711 %U https://medinform.www.mybigtv.com/2022/11/e36711 %U https://doi.org/10.2196/36711 %U http://www.ncbi.nlm.nih.gov/pubmed/36318244
Baidu
map