基于卡方差异性和t-SNE的定性数据分类研究
DOI:
CSTR:
作者:
作者单位:

陕西电子信息职业技术学院,陕西西安 710500

作者简介:

通讯作者:

中图分类号:

TP311.13

基金项目:


Qualitative Data Classification based on Chi-Square Dissimilarity and t-SNE
Author:
Affiliation:

Shanxi Electrronic Information Institute,Shanxi Xi’an 710500

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对定性数据环境下分类精度低且计算成本高的问题,提出了一种利用传统分类器和不同映射技术来提高类别可分性的分类变量识别方法。通过将初始特征(分类属性)映射到实数域空间,利用卡方距离(C-S)作为差异性的度量,增加特征空间的维数以提高类的可分性。运用t-分布领域嵌入算法(t-SNE)将数据的维数降到两个或三个特征,从而减少了学习方法的计算时间。通过在公共分类数据集上的实验证明,C-S映射和t-SNE在保证识别精度的同时,大大减少了识别任务的计算量。同时,当只将C-S映射应用于数据集时,类别的可分性得到了增强,从而显著地提高了学习算法的性能。

    Abstract:

    To solve the problem of low classification accuracy and high computational cost in the qualitative data environment, a classification variable identification method was proposed to improve the classification separability by using traditional classifiers and different mapping techniques. By mapping the initial feature (classification attribute) to the real domain space and using the chi-square (C-S) as the measure of difference, the dimension of the feature space is increased to improve the class separability. The t-distributed domain embedding algorithm (tSNE) is used to reduce the dimension of the data to two or three features, thus reducing the calculation time of the learning method. It is proved by experiments on the common classification data set that C-S mapping and t-SNE not only guarantee the recognition accuracy, but also greatly reduce the computation of recognition task. At the same time, when only C-S mapping is applied to the data set, the separability of categories is enhanced, thus significantly improving the performance of the learning algorithm.

    参考文献
    相似文献
    引证文献
引用本文

张蕾.基于卡方差异性和t-SNE的定性数据分类研究[J].电子测量技术,2021,44(5):100-106

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-10-24
  • 出版日期:
文章二维码