基于扩张卷积的注意力机制视频描述模型

首页 > 过刊浏览>2021年第44卷第23期 >99-104

基于扩张卷积的注意力机制视频描述模型
DOI:
                        
CSTR:
                        
作者:
                        
作者单位:广西师范大学 电子工程学院 广西 桂林 541004
作者简介:
通讯作者:
中图分类号:TP391.4；TP183
基金项目:国家自然科学基金项目(61976063)资助

Video description model of attention mechanism based on dilated convolution

Author:

Affiliation:

School of Electronic Engineering, Guangxi Normal University, Guilin Guangxi 541004, China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对视频描述过程中视觉特征和词特征关联度不足、训练效率低、生成的自然语言出现错误和指标分数不高的问题，提出了一种基于扩张卷积的注意力机制视频描述模型。在模型的编码阶段，采用Inception-v4对视频特征进行编码，然后将编码后的视觉特征和词特征输入到基于扩张卷积的注意力机制中，最后通过长短期记忆网络进行解码，生成视频的自然描述语句。在视频描述公共数据集MSVD上进行对比实验，通过评价指标(BLEU、ROUGE_L、CIDEr、METEOR)对模型进行验证，实验结果表明，基于扩张卷积的注意力机制视频描述模型在各个指标上都有明显提升，对比基线模型SA-LSTM (Inception-V4)，在BLEU_4、ROUGE_L、CIDEr和METEOR指标下分别提升了4.23%、4.73%、2.11%和2.45%。

Abstract:

In order to solve the problems of insufficient correlation between visual features and word features, low training efficiency, errors in generated natural language and low index scores in the process of video description, a video description model based on the attention mechanism of dilated convolution is proposed. In the encoding stage of the model, Inception-v4 is used to encode the video features, and then the encoded visual features and word features are input into the attention mechanism based on dilated convolution. Finally, the video is decoded through the long short-term memory network to generate the natural description statement of the video. A comparative experiment was conducted on the public video description data set MSVD, and the model was verified by evaluation indicators (BLEU, ROUGE_L, CIDEr, METEOR). The experimental results showed that the video description model based on the attention mechanism of dilated convolution has significantly improved in all indicators. Compared with the baseline model SA-LSTM (Inception-V4), the BLEU_4, ROUGE_L, CIDEr and METEOR indicators have increased by 4.23%, 4.73%, 2.11% and 2.45% respectively.

参考文献

相似文献

引证文献

引用本文

王金金,曾上游,李文惠,张介滨.基于扩张卷积的注意力机制视频描述模型[J].电子测量技术,2021,44(23):99-104

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2024-07-02
出版日期:

网站首页

杂志简介

过刊浏览

投稿须知

欢迎订阅

编委会

联系我们

English

引用本文

分享

相关视频

文章指标

历史

文章二维码

重要通知公告