Abstract:Aiming at the problems of insufficient topic feature representation ability and high feature dimension caused by high dimensionality of data in topic-based text classification task, this paper improves the input feature representation and Convolutional Neural Networks (CNN) structure. In the feature representation, LDA model is proposed to calculate the inverse topic space frequency to get the topic vector matrix of text, which reduces the feature representation of noisy topic and enhances the weight of key topic. The topic vector matrix and word vector matrix of the text are respectively used as the input of CNN model. A two-layer CNN network structure is proposed, and a multi-channel pooling layer is added after the pooling layer of each layer of CNN to integrate the pooling results of each layer of CNN, and obtain more local salient features while reducing the feature dimension. Finally, the Attention mechanism is used to weight the fused features and input them to the fully connected layer for classification. According to the experimental results, the accuracy and recall of the improved model in text classification tasks are above 98%, and the F1 value is increased by 6% compared with the benchmark experiment.