基于融合LDA与双层CNN的文本分类研究
DOI:
作者:
作者单位:

1.新疆大学软件学院 乌鲁木齐 830046; 2.新疆大学信息科学与工程学院 乌鲁木齐 830046

作者简介:

通讯作者:

中图分类号:

TP391

基金项目:

新疆维吾尔族自治区教育厅重点基金(XJEDU2019I004)项目资助


Research on text classification based on fusion of LDA and two-layer CNN
Author:
Affiliation:

1.School of Software,Xinjiang University,Urumqi 830046,China; 2.College of Information Science and Engineering,Xinjiang University, Urumqi 830046,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对基于主题的文本分类任务存在的主题特征表征能力不足、数据高维导致的特征维度高等问题,本文对输入的特征表示与卷积神经网络结构(CNN)做出了改进。在特征表示时提出了使用LDA模型计算逆主题空间频率从而得到文本的主题向量矩阵,降低了噪声主题的特征表达,增强了关键主题的权重;分别将文本的主题向量矩阵与词向量矩阵作为CNN模型的输入。提出了双层CNN网络结构,在每层CNN的池化层后增加一层多通道池化层,以融合每层CNN的池化结果,降低特征维度的同时获取更多的局部显著特征;最后使用Attention机制对融合的特征进行加权后输入到全连接层进行分类。由实验结果可知,改进的模型在文本分类任务上的准确率、召回率均在98%以上,F1值较基准实验提高了近6%。

    Abstract:

    Aiming at the problems of insufficient topic feature representation ability and high feature dimension caused by high dimensionality of data in topic-based text classification task, this paper improves the input feature representation and Convolutional Neural Networks (CNN) structure. In the feature representation, LDA model is proposed to calculate the inverse topic space frequency to get the topic vector matrix of text, which reduces the feature representation of noisy topic and enhances the weight of key topic. The topic vector matrix and word vector matrix of the text are respectively used as the input of CNN model. A two-layer CNN network structure is proposed, and a multi-channel pooling layer is added after the pooling layer of each layer of CNN to integrate the pooling results of each layer of CNN, and obtain more local salient features while reducing the feature dimension. Finally, the Attention mechanism is used to weight the fused features and input them to the fully connected layer for classification. According to the experimental results, the accuracy and recall of the improved model in text classification tasks are above 98%, and the F1 value is increased by 6% compared with the benchmark experiment.

    参考文献
    相似文献
    引证文献
引用本文

杨雳,刘胜全,贾李睿智,解舒淇.基于融合LDA与双层CNN的文本分类研究[J].电子测量技术,2023,46(7):1-6

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-02-18
  • 出版日期: