注意力增强的视觉Transformer图像检索算法

首页 > 过刊浏览>2023年第46卷第23期 >50-55

注意力增强的视觉Transformer图像检索算法
DOI:
                        
                    
CSTR:
                        [cstr]
                    
作者:
                        刘华咏刘华咏
华中师范大学计算机学院 武汉 430070
在期刊界中查找
在百度中查找
在本站中查找
黄聪黄聪
华中师范大学计算机学院 武汉 430070
在期刊界中查找
在百度中查找
在本站中查找
金汉均金汉均
华中师范大学计算机学院 武汉 430070
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:华中师范大学计算机学院 武汉 430070
作者简介:
通讯作者:
中图分类号:TP391
基金项目:教育部人文社会科学研究项目（21YJA870005）资助

Image retrieval method with attention-enhanced visual Transformer

Author:

Liu Huayong
Liu Huayong
School of Computer Science, Central China Normal University，Wuhan 430070, China
在期刊界中查找
在百度中查找
在本站中查找
Huang Cong
Huang Cong
School of Computer Science, Central China Normal University，Wuhan 430070, China
在期刊界中查找
在百度中查找
在本站中查找
Jin Hanjun
Jin Hanjun
School of Computer Science, Central China Normal University，Wuhan 430070, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

School of Computer Science, Central China Normal University，Wuhan 430070, China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

基于深度哈希的图像检索方法往往利用卷积和池化技术去提取图像局部信息，并且需要不断加深网络层次来获得全局长依赖关系，这些方法一般具有较高的复杂度和计算量。本文提出了一种注意力增强的视觉Transformer图像检索算法，算法使用预训练的视觉Transformer作为基准模型，提升模型收敛速度，通过对骨干网络的改进和哈希函数的设计，实现了高效的图像检索。一方面，本文设计了一个注意力增强模块，来捕获输入特征图的局部显著信息和视觉细节，学习相应的权重以突出重要特征，并增强输入到Transformer编码器的图像特征的表征力。另一方面，为了提高图像检索的效率，设计了一种对比哈希损失函数，生成具有判别力的二进制哈希码，从而降低了内存需求与计算复杂度。在CIFAR-10和NUS-WIDE数据集上的实验结果表明，本文提出的方法，在两个不同数据集上使用不同哈希码长度的平均精度均值达到了96.8%和86.8%，性能超过多种经典的深度哈希算法和其他两种基于Transformer架构的图像检索算法。

关键词:图像检索;视觉Transformer;深度哈希;注意力模块

Abstract:

The image retrieval methods based on deep hashing often use convolution and pooling techniques to extract local information from images and require deepening the network layers to obtain global long-range dependencies. These methods generally have high complexity and computational requirements. This paper proposes a vision Transformer-based image retrieval algorithm enhanced with attention, which uses a pre-trained vision Transformer as a benchmark model to improves model convergence speed and achieves efficient image retrieval through improvements to the backbone network and hash function design. On the one hand, the algorithm designs an attention enhancement module to capture local salient information and visual details of the input feature map, learns corresponding weights to highlight important features, enhances the representativeness of image features input to the Transformer encoder. On the other hand, to generate discriminative hash codes, a contrastive hash loss is designed to further ensure the accuracy of image retrieval. Experimental results on the CIFAR-10 and NUS-WIDE datasets show that the proposed method achieves an average precision of 96.8% and 86.8%, respectively, using different hash code lengths on two different datasets, outperforming various classic deep hashing algorithms and two other Transformer-based image retrieval algorithms.

Key words:image retrieval;vision Transformer;deep hash;attention module

引用本文

刘华咏,黄聪,金汉均.注意力增强的视觉Transformer图像检索算法[J].电子测量技术,2023,46(23):50-55

复制

文章指标

点击次数:427
下载次数: 538
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2024-03-21
出版日期:

网站首页

杂志简介

过刊浏览

投稿须知

欢迎订阅

联系我们

English

引用本文

分享

文章指标

历史

文章二维码

网站首页

杂志简介

过刊浏览

投稿须知

欢迎订阅

联系我们

English

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码