TY -的盟Himmel沃尔夫冈•AU - Reincke乌尔里希盟——Michelmann,汉斯·威廉PY - 2009 DA - 2009/7/22 TI -文本挖掘和自然语言处理的自动分类方法请求网络专家论坛乔- J地中海互联网Res SP - e25六世- 11 - 3 KW -文本挖掘KW -定性研究KW -自然语言处理KW -消费者健康信息学千瓦互联网KW -远程咨询KW -不孕AB -背景:健康的人和生病的人越来越多地使用电子媒体来获取医疗信息和建议。例如,互联网用户可以向基于网络的专家论坛或所谓的“询问医生”服务发送请求。目的:结合不同的文本挖掘策略,对Internet医学专家论坛的外行请求进行自动分类。方法:我们首先手动将德国网站“Rund ums Baby”(“关于婴儿的一切”)上一个非自愿生育论坛的988个请求样本分为两个维度(“主题”和“期望”)的38个类别中的一个或多个。在创建了起始和同义词列表之后,我们计算了每个词与每个类别的关联的平均Cramer 's V统计量。我们还使用了主成分分析和奇异值分解作为进一步的文本挖掘策略。通过这些测量,我们训练回归模型,并在最佳回归模型的基础上,确定任何请求属于38个不同类别中的每个类别的概率,临界值为50%。计算测试样本的召回率和精密度作为自动分类的质量衡量标准。结果:对988篇文献进行人工分类,102篇(10%)属于体外受精(IVF)类,81篇(8%)属于排卵类,79篇(8%)属于周期类,57篇(6%)属于精液分析类。 These were the four most frequent categories in the subject matter dimension (consisting of 32 categories). The expectation dimension comprised six categories; we classified 533 documents (54%) as “general information” and 351 (36%) as a wish for “treatment recommendations.” The generation of indicator variables based on the chi-square analysis and Cramer’s V proved to be the best approach for automatic classification in about half of the categories. In combination with the two other approaches, 100% precision and 100% recall were realized in 18 (47%) out of the 38 categories in the test sample. For 35 (92%) categories, precision and recall were better than 80%. For some categories, the input variables (ie, “words”) also included variables from other categories, most often with a negative sign. For example, absence of words predictive for “menstruation” was a strong indicator for the category “pregnancy test.” Conclusions: Our approach suggests a way of automatically classifying and analyzing unstructured information in Internet expert forums. The technique can perform a preliminary categorization of new requests and help Internet medical experts to better handle the mass of information and to give professional feedback. SN - 1438-8871 UR - //www.mybigtv.com/2009/3/e25/ UR - https://doi.org/10.2196/jmir.1123 UR - http://www.ncbi.nlm.nih.gov/pubmed/19632978 DO - 10.2196/jmir.1123 ID - info:doi/10.2196/jmir.1123 ER -
Baidu
map