TY - JOUR AU - Rahman, Quazi Abidur AU - Janmohamed, Tahir AU - Clarke, Hance AU - Ritvo, Paul AU - Heffernan, Jane AU - Katz, Joel PY - 2019 DA - 2019/11/20 TI - Manage My Pain应用用户中疼痛波动预测模型的可解释性和类不平衡:JO - JMIR Med Inform SP - e15601 VL - 7 IS - 4kw -慢性疼痛KW -疼痛波动KW -数据挖掘KW -聚类分析KW -机器学习KW -预测模型KW -管理我的疼痛KW -疼痛应用程序AB -背景:疼痛波动是慢性疼痛体验和适应的重要因素。之前,我们使用机器学习方法来定义和预测Manage My pain应用程序用户的疼痛波动水平。减少特征的数量对于帮助提高此类预测模型的可解释性非常重要。预测结果还需要从多个随机子样本中整合,以解决类不平衡问题。目的:本研究旨在:(1)通过识别区分高波动率用户和低波动率用户的最重要特征,提高先前开发的疼痛波动率模型的可解释性;(2)巩固来自多个随机子样本模型的预测结果,同时解决类不平衡问题。方法:从应用程序使用的第一个月提取了132个特征,以开发基于机器学习的模型,用于预测应用程序使用第六个月的疼痛波动。应用了三种特征选择方法来识别比用于开发预测模型的大特征集的其他成员明显更好的预测因子:(1)基尼杂质准则;(2)信息增益准则;(3)博鲁塔。 We then combined the three groups of important features determined by these algorithms to produce the final list of important features. Three machine learning methods were then employed to conduct prediction experiments using the selected important features: (1) logistic regression with ridge estimators; (2) logistic regression with least absolute shrinkage and selection operator; and (3) random forests. Multiple random under-sampling of the majority class was conducted to address class imbalance in the dataset. Subsequently, a majority voting approach was employed to consolidate prediction results from these multiple subsamples. The total number of users included in this study was 879, with a total number of 391,255 pain records. Results: A threshold of 1.6 was established using clustering methods to differentiate between 2 classes: low volatility (n=694) and high volatility (n=185). The overall prediction accuracy is approximately 70% for both random forests and logistic regression models when using 132 features. Overall, 9 important features were identified using 3 feature selection methods. Of these 9 features, 2 are from the app use category and the other 7 are related to pain statistics. After consolidating models that were developed using random subsamples by majority voting, logistic regression models performed equally well using 132 or 9 features. Random forests performed better than logistic regression methods in predicting the high volatility class. The consolidated accuracy of random forests does not drop significantly (601/879; 68.4% vs 618/879; 70.3%) when only 9 important features are included in the prediction model. Conclusions: We employed feature selection methods to identify important features in predicting future pain volatility. To address class imbalance, we consolidated models that were developed using multiple random subsamples by majority voting. Reducing the number of features did not result in a significant decrease in the consolidated prediction accuracy. SN - 2291-9694 UR - http://medinform.www.mybigtv.com/2019/4/e15601/ UR - https://doi.org/10.2196/15601 UR - http://www.ncbi.nlm.nih.gov/pubmed/31746764 DO - 10.2196/15601 ID - info:doi/10.2196/15601 ER -
Baidu
map