%0期刊文章%@ 2291- 9694% I JMIR出版物%V 7%卡塔尔世界杯8强波胆分析 N 4% P e15601 %T可解释性和类不平衡预测模型中的疼痛波动在管理我的疼痛应用程序用户:使用特征选择和多数投票方法的分析%A Rahman,Quazi Abidur %A Janmohamed,Tahir %A Clarke,Hance %A Ritvo,Paul %A Heffernan,Jane %A Katz,Joel +湖首大学计算机科学系,加拿大安大略省雷湾奥利弗路955号,1 (807)346 7789,quazi.rahman@lakeheadu.ca %K慢性疼痛%K疼痛波动率%K数据挖掘%K聚类分析%K机器学习%K预测模型%K管理我的疼痛%K疼痛app %D 2019 %7 20.11.2019 %9原创论文%J JMIR Med Inform %G英文%X背景:疼痛波动率是慢性疼痛体验和适应的重要因素。之前,我们使用机器学习方法来定义和预测Manage My pain应用程序用户的疼痛波动水平。减少特征的数量对于帮助提高此类预测模型的可解释性非常重要。预测结果还需要从多个随机子样本中整合,以解决类不平衡问题。目的:本研究旨在:(1)通过识别区分高波动率用户和低波动率用户的最重要特征,提高先前开发的疼痛波动率模型的可解释性;(2)巩固来自多个随机子样本模型的预测结果,同时解决类不平衡问题。方法:从应用程序使用的第一个月提取了132个特征,以开发基于机器学习的模型,用于预测应用程序使用第六个月的疼痛波动。应用了三种特征选择方法来识别比用于开发预测模型的大特征集的其他成员明显更好的预测因子:(1)基尼杂质准则;(2)信息增益准则; and (3) Boruta. We then combined the three groups of important features determined by these algorithms to produce the final list of important features. Three machine learning methods were then employed to conduct prediction experiments using the selected important features: (1) logistic regression with ridge estimators; (2) logistic regression with least absolute shrinkage and selection operator; and (3) random forests. Multiple random under-sampling of the majority class was conducted to address class imbalance in the dataset. Subsequently, a majority voting approach was employed to consolidate prediction results from these multiple subsamples. The total number of users included in this study was 879, with a total number of 391,255 pain records. Results: A threshold of 1.6 was established using clustering methods to differentiate between 2 classes: low volatility (n=694) and high volatility (n=185). The overall prediction accuracy is approximately 70% for both random forests and logistic regression models when using 132 features. Overall, 9 important features were identified using 3 feature selection methods. Of these 9 features, 2 are from the app use category and the other 7 are related to pain statistics. After consolidating models that were developed using random subsamples by majority voting, logistic regression models performed equally well using 132 or 9 features. Random forests performed better than logistic regression methods in predicting the high volatility class. The consolidated accuracy of random forests does not drop significantly (601/879; 68.4% vs 618/879; 70.3%) when only 9 important features are included in the prediction model. Conclusions: We employed feature selection methods to identify important features in predicting future pain volatility. To address class imbalance, we consolidated models that were developed using multiple random subsamples by majority voting. Reducing the number of features did not result in a significant decrease in the consolidated prediction accuracy. %M 31746764 %R 10.2196/15601 %U http://medinform.www.mybigtv.com/2019/4/e15601/ %U https://doi.org/10.2196/15601 %U http://www.ncbi.nlm.nih.gov/pubmed/31746764
Baidu
map