基于扩张卷积的注意力机制视频描述模型
DOI:
作者:
作者单位:

广西师范大学 电子工程学院 广西 桂林 541004

作者简介:

通讯作者:

中图分类号:

TP391.4;TP183

基金项目:

国家自然科学基金项目(61976063)资助


Video description model of attention mechanism based on dilated convolution
Author:
Affiliation:

School of Electronic Engineering, Guangxi Normal University, Guilin Guangxi 541004, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对视频描述过程中视觉特征和词特征关联度不足、训练效率低、生成的自然语言出现错误和指标分数不高的问题,提出了一种基于扩张卷积的注意力机制视频描述模型。在模型的编码阶段,采用Inception-v4对视频特征进行编码,然后将编码后的视觉特征和词特征输入到基于扩张卷积的注意力机制中,最后通过长短期记忆网络进行解码,生成视频的自然描述语句。在视频描述公共数据集MSVD上进行对比实验,通过评价指标(BLEU、ROUGE_L、CIDEr、METEOR)对模型进行验证,实验结果表明,基于扩张卷积的注意力机制视频描述模型在各个指标上都有明显提升,对比基线模型SA-LSTM (Inception-V4),在BLEU_4、ROUGE_L、CIDEr和METEOR指标下分别提升了4.23%、4.73%、2.11%和2.45%。

    Abstract:

    In order to solve the problems of insufficient correlation between visual features and word features, low training efficiency, errors in generated natural language and low index scores in the process of video description, a video description model based on the attention mechanism of dilated convolution is proposed. In the encoding stage of the model, Inception-v4 is used to encode the video features, and then the encoded visual features and word features are input into the attention mechanism based on dilated convolution. Finally, the video is decoded through the long short-term memory network to generate the natural description statement of the video. A comparative experiment was conducted on the public video description data set MSVD, and the model was verified by evaluation indicators (BLEU, ROUGE_L, CIDEr, METEOR). The experimental results showed that the video description model based on the attention mechanism of dilated convolution has significantly improved in all indicators. Compared with the baseline model SA-LSTM (Inception-V4), the BLEU_4, ROUGE_L, CIDEr and METEOR indicators have increased by 4.23%, 4.73%, 2.11% and 2.45% respectively.

    参考文献
    相似文献
    引证文献
引用本文

王金金,曾上游,李文惠,张介滨.基于扩张卷积的注意力机制视频描述模型[J].电子测量技术,2021,44(23):99-104

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-07-02
  • 出版日期: