Mixing topic models and character word embeddings to make lda2vec
DOI:
CSTR:
Author:
Affiliation:

Clc Number:

TP183;TN01

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    In order to better apply the original English topic vector model to the training of Chinese topic model vector, and solve the shortcomings of setting the topic number. This paper changes the linear addition of the document vector and the word vector in the original topic vector model to the inner product. and combines with the document vector, character vector and word vector to train the topic vector. When the topic vector is obtained, the similar topics are gathered together by the clustering method. Meanwhile, it can determine the number of topics. Experiments show that the relevance of the topic words trained by this method is improved compared with original and traditional model, and the number of themes can be obtained reasonably. At the same time, word vector, topic vector and document representation can be obtained.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:
  • Revised:
  • Adopted:
  • Online: July 20,2021
  • Published:
Article QR Code