一种基于QBC不一致性的恶意加密流量识别方法
DOI:
CSTR:
作者:
作者单位:

西南石油大学 计算机科学学院 成都 610500

作者简介:

通讯作者:

中图分类号:

TP393.08

基金项目:

国家自然科学基金(61902328)项目资助


Method for identifying malicious encrypted traffic based on QBC inconsistency
Author:
Affiliation:

School of Computer Science, Southwest Petroleum University,Chengdu 610500, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    当前基于机器学习的恶意加密流量识别主要采用有监督学习,依赖大量标注样本,但在真实环境中恶意流量不仅稀缺而且标注依赖专家经验,标注成本较高,而主动学习通过迭代训练选择困难样本(hardsample)进行训练,一定程度上减少了训练样本量,但当前基于委员会投票的hardsample选择策略粒度较粗,所选样本质量不佳。针对该问题,提出一种改进委员会投票(Query by Committee, QBC)的恶意加密流量识别方法CBU(Committee-based Uncertainty, CBU),设计了委员会对样本不一致性的计算方法,并结合已标注与未标注样本相似性分析,有效度量样本不确定性,从而选择高质量hardsample,以减少样本标记和训练量。实验使用业界标准数据集CTU以及真实恶意数据集进行测试,结果表明,相比传统委员会投票策略,CBU样本标记量减少一倍,只采用15%数据的情况下识别准确率达到96%,有效减少样本标注和训练量,具有较强实用性。

    Abstract:

    At present, the identification of malicious encrypted traffic based on machine learning mainly uses supervised learning and relies on a large number of labeled samples. However, in the real environment, malicious traffic is not only scarce but also depends on expert experience, and the labeling cost is high. Active learning selects difficult samples through iterative for training, which reduces the amount of training samples to a certain extent, but the current hardsample selection strategy based on committee votes has a coarser granularity, and the quality of the selected samples is not good. In response to this problem, a CBU (Committee-based Uncertainty, CBU) is proposed to improve the Query by Committee (QBC) method for identifying malicious encrypted traffic. Labeling sample similarity analysis, effectively measuring sample uncertainty, and selecting high-quality hardsamples to reduce sample labeling and training volume. The experiment uses the industry standard data set CTU and real malicious data sets for testing. The results show that compared with the traditional committee voting strategy, the amount of CBU sample labeling is doubled, and the recognition accuracy rate of only 15% of the data amount is 96%, which effectively reduces the sample labeling. And training volume, and it has strong practicability.

    参考文献
    相似文献
    引证文献
引用本文

张荣华,刘 智,罗 琴.一种基于QBC不一致性的恶意加密流量识别方法[J].电子测量技术,2022,45(1):28-34

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-06-19
  • 出版日期:
文章二维码