文章@ 1438-8871 %I Gunther Eysenbach %V 11 %N 3 %P e25 %T基于web的专家论坛请求自动分类的文本挖掘和自然语言处理方法%A Himmel,Wolfgang %A Reincke,Ulrich %A Michelmann,Hans Wilhelm %+全科/家庭医学系Göttingen,洪堡塔利38,37070 Göttingen,德国,+49 0 551 39 22648,whimmel@gwdg.de %K文本挖掘%K定性研究%K自然语言处理%K消费者健康信息%K互联网%K远程咨询%K不孕不育%D 2009 %7 22.7.2009 %9原始论文%J J医学互联网Res %G英语%X背景:健康和病人越来越多地使用电子媒体获取医疗信息和建议。例如,互联网用户可以向基于网络的专家论坛或所谓的“询问医生”服务发送请求。目的:结合不同的文本挖掘策略,对Internet医学专家论坛的外行请求进行自动分类。方法:我们首先手动将德国网站“Rund ums Baby”(“关于婴儿的一切”)上一个非自愿生育论坛的988个请求样本分为两个维度(“主题”和“期望”)的38个类别中的一个或多个。在创建了起始和同义词列表之后,我们计算了每个词与每个类别的关联的平均Cramer 's V统计量。我们还使用了主成分分析和奇异值分解作为进一步的文本挖掘策略。通过这些测量,我们训练回归模型,并在最佳回归模型的基础上,确定任何请求属于38个不同类别中的每个类别的概率,临界值为50%。计算测试样本的召回率和精密度作为自动分类的质量衡量标准。结果:对988篇文献进行人工分类,102篇(10%)属于体外受精(IVF)类,81篇(8%)属于排卵类,79篇(8%)属于周期类,57篇(6%)属于精液分析类。 These were the four most frequent categories in the subject matter dimension (consisting of 32 categories). The expectation dimension comprised six categories; we classified 533 documents (54%) as “general information” and 351 (36%) as a wish for “treatment recommendations.” The generation of indicator variables based on the chi-square analysis and Cramer’s V proved to be the best approach for automatic classification in about half of the categories. In combination with the two other approaches, 100% precision and 100% recall were realized in 18 (47%) out of the 38 categories in the test sample. For 35 (92%) categories, precision and recall were better than 80%. For some categories, the input variables (ie, “words”) also included variables from other categories, most often with a negative sign. For example, absence of words predictive for “menstruation” was a strong indicator for the category “pregnancy test.” Conclusions: Our approach suggests a way of automatically classifying and analyzing unstructured information in Internet expert forums. The technique can perform a preliminary categorization of new requests and help Internet medical experts to better handle the mass of information and to give professional feedback. %M 19632978 %R 10.2196/jmir.1123 %U //www.mybigtv.com/2009/3/e25/ %U https://doi.org/10.2196/jmir.1123 %U http://www.ncbi.nlm.nih.gov/pubmed/19632978
Baidu
map