基于BERT和知识蒸馏的航空维修领域命名实体识别
DOI:
CSTR:
作者:
作者单位:

1.海军航空大学 烟台 264001; 2.91475部队 葫芦岛 125001

作者简介:

通讯作者:

中图分类号:

TP391.1

基金项目:


Aviation maintenance text named entity recognition based on BERT and knowledge distillation
Author:
Affiliation:

1.Naval Aviation University,Yantai 264001, China; 2.The No. 91475th Troop of PLA,Huludao 125001, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对军事航空维修领域命名实体识别训练数据少,标注成本高的问题,改进提出一种基于预训练BERT的命名实体识别方法,借鉴远程监督思想,对字符融合远程标签词边界特征得到特征融合向量,送入BERT生成动态字向量表示,连接CRF模型得到序列的全局最优结果,在自建数据集上进行实验,F1值达到0.861。为压缩模型参数,使用训练好的BERT-CRF模型生成伪标签数据,结合知识蒸馏技术指导参数量较少的学生模型BiGRU-CRF进行训练。实验结果表明,与教师模型相比,学生模型以损失2%的F1值为代价,参数量减少了95.2%,运算推理时间缩短了47%。

    Abstract:

    Aiming at the problems of less training data and high labeling cost of named entity recognition in the military aircraft maintenance field. The paper proposed an improved named entity recognition method based on pre-training BERT. Firstly, learn from the idea of remote supervision, we fuse the boundary features of remote Tag word on token to get the feature fusion vector. Then the vector is sent to BERT to generate a dynamic word vector representation. Finally, the CRF model is connected to get the global optimal result of the sequence. Experiments are carried out on the self-built dataset, and the F1 value reaches 0.861. In order to compress the model parameters, the trained BERT-CRF model is used to generate pseudo label data, and the student model BiGRU-CRF with less parameters is trained in combination with knowledge distillation technology. The experimental results show that compared with the teacher model, the student model reduces 95.2% of the parameters and 47% of the reasoning time at the cost of losing 2% of the F1 value.

    参考文献
    相似文献
    引证文献
引用本文

顾佼佼,翟一琛,姬嗣愚,宗富强.基于BERT和知识蒸馏的航空维修领域命名实体识别[J].电子测量技术,2023,46(3):19-24

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-02-26
  • 出版日期:
文章二维码