%0杂志文章%@ 1438- 8871% I JMIR出版物%V 22卡塔尔世界杯8强波胆分析% N 8% P e18387% T使用机器学习对癌症患者RNA表达数据进行个人健康信息推断:算法验证研究%A Kweon,Solbi %A Lee,郑勋%A Lee,Younghee %A Park,Yu Rang %+延世大学医学院生物医学系统信息系,首尔西大门区延世路50-1,03722,韩国,82 2 2228 ext 2493, yurangpark@yuhs.ac %K癌症%K隐私问题%K个人信息%K预测%K RNA测序%K机器学习%D 2020 %7 10.8.2020 %9原创论文%J J Med Internet Res %G英文%X背景:随着共享基因组数据需求的增长,隐私问题和担忧,如围绕数据共享和个人信息披露的道德问题也被提出。目的:本研究的主要目的是验证基因组数据是否足以预测患者的个人信息。方法:在癌症基因组图谱项目中收集9538例患者的RNA表达数据和匹配的患者个人信息。每位患者记录了5个个人信息变量(年龄、性别、种族、癌症类型和癌症分期)。使用四种不同的机器学习算法(支持向量机、决策树、随机森林和人工神经网络)来确定是否可以从RNA表达数据中准确预测患者的个人信息。预测模型的性能测量是基于准确性和接受者工作特征曲线下的面积。我们选择了五种大样本量的癌症类型(乳腺癌、肾肾透明细胞癌、头颈部鳞状细胞癌、低级别胶质瘤和肺腺癌)来验证它们之间的预测准确性是否存在差异。我们还验证了我们的四个机器学习模型在分析来自593名癌症患者的正常样本时的有效性。 Results: In most samples, personal information with high genetic relevance, such as gender and cancer type, could be predicted from RNA expression data alone. The prediction accuracies for gender and cancer type, which were the best models, were 0.93-0.99 and 0.78-0.94, respectively. Other aspects of personal information, such as age, race, and cancer stage, were difficult to predict from RNA expression data, with accuracies ranging from 0.0026-0.29, 0.76-0.96, and 0.45-0.79, respectively. Among the tested machine learning methods, the highest predictive accuracy was obtained using the support vector machine algorithm (mean accuracy 0.77), while the lowest accuracy was obtained using the random forest method (mean accuracy 0.65). Gender and race were predicted more accurately than other variables in the samples. On average, the accuracy of cancer stage prediction ranged between 0.71-0.67, while the age prediction accuracy ranged between 0.18-0.23 for the five cancer types. Conclusions: We attempted to predict patient information using RNA expression data. We found that some identifiers could be predicted, but most others could not. This study showed that personal information available from RNA expression data is limited and this information cannot be used to identify specific patients. %M 32773372 %R 10.2196/18387 %U //www.mybigtv.com/2020/8/e18387 %U https://doi.org/10.2196/18387 %U http://www.ncbi.nlm.nih.gov/pubmed/32773372
Baidu
map