%0期刊文章@ 2368- 7959% I JMIR出版公司使用基于互卡塔尔世界杯8强波胆分析联网的个人资料和语言特征识别高自杀概率的中国微博用户:分类模型%A Guan,Li %A Hao,Bibo %A Cheng,Qijin %A Yip,Paul SF %A Zhu,Tingshao %+中国科学院行为科学重点实验室,中国北京市朝阳区临翠路16号合谢楼821室,100101,86 15010965509,tszhu@psych.ac.cn %K自杀概率%K微博%K中文%K分类模型%D 2015 %7 12.05.2015 %9原创论文%J JMIR心理健康%G英文%X传统的自杀概率离线评估既耗时又难以说服高危人群参与。通过在线社交媒体识别高自杀概率的个体在其效率和潜力上具有优势,可以接触到隐藏的个体,但很少有研究关注这一具体领域。目的:本研究的目的是应用简单逻辑回归(SLR)和随机森林(RF)两种分类模型,检验通过从基于互联网的数据中提取个人资料和语言特征来识别中国高自杀可能性微博用户的可行性和有效性。方法:有900名中国微博用户完成了一项互联网调查,参与者样本中自杀概率量表(SPS)总得分高于平均值1个SD的人,以及四个子量表得分均高于平均值1个SD的人,分别被标记为高危人群。档案和语言特征被输入两种机器学习算法(SLR和RF)来训练模型,目的是在一般自杀概率和四个维度上识别高风险个体。对模型进行训练,然后进行5倍交叉验证;其中训练集和测试集都是根据分层随机抽样规则从整个样本中生成的。有三个经典的性能指标(精密度、召回率、F1测量)和一个专门定义的指标“筛选效率”被用来评估模型的有效性。 Results: Classification performance was generally matched between SLR and RF. Given the best performance of the classification models, we were able to retrieve over 70% of the labeled high-risk individuals in overall suicide probability as well as in the four dimensions. Screening Efficiency of most models varied from 1/4 to 1/2. Precision of the models was generally below 30%. Conclusions: Individuals in China with high suicide probability are recognizable by profile and text-based information from microblogs. Although there is still much space to improve the performance of classification models in the future, this study may shed light on preliminary screening of risky individuals via machine learning algorithms, which can work side-by-side with expert scrutiny to increase efficiency in large-scale-surveillance of suicide probability from online social media. %M 26543921 %R 10.2196/mental.4227 %U http://mental.www.mybigtv.com/2015/2/e17/ %U https://doi.org/10.2196/mental.4227 %U http://www.ncbi.nlm.nih.gov/pubmed/26543921
Baidu
map