融合多尺度特征及注意力机制的食品图像识别
DOI:
CSTR:
作者:
作者单位:

1.四川轻化工大学计算机科学与工程学院 宜宾 644000; 2.四川省大数据可视化分析技术工程实验室 宜宾 644000; 3.四川轻化工大学自动化与信息工程学院 宜宾 644000; 4.企业信息化与物联网测控技术四川省高等学校重点实验室 宜宾 644000

作者简介:

通讯作者:

中图分类号:

TP393;TN98

基金项目:

四川轻化工大学科研创新团队计划 (SUSE652A006)、企业信息化与物联网测控技术四川省高校重点实验室(2022WYJ01)项目资助


Integrating multi-scale features and attention mechanisms for food image recognition
Author:
Affiliation:

1.School of Computer Science and Engineering, Sichuan University of Science & Engineering,Yibin 644000, China; 2.Sichuan Big Data Visualization Analysis Technology Engineering Laboratory,Yibin 644000, China; 3.School of Automation and Information Engineering, Sichuan University of Science & Engineering,Yibin 644000, China; 4.Key Laboratory of Measurement and Control Technology of Enterprise Informatization and Internet of Things of Sichuan Universities,Yibin 644000,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对食品图像中类间差异小、类内差异大以及结构复杂导致识别难度大等问题,提出了一种融合多尺度特征及注意力机制的食品图像识别方法。首先,采用特征提取能力更强的ConvNeXt模型作为主干网络,以更好地捕捉食品图像的细节特征;其次,引入改进的ASPP模块,扩展感受野并利用多尺度信息,增强模型对不同尺度特征的捕捉能力;最后,在每个卷积块后加入注意力机制,提高特征表达和上下文信息捕捉能力。实验结果表明,所提方法在Vireo Food172扩展数据集和ETH Food101数据集上的准确率分别达到91.56%和87.22%,相比原模型分别提高了2.05%和1.66%,验证了该方法的有效性。

    Abstract:

    To address the challenges in food image recognition caused by small inter-class differences, large intra-class variations, and complex structures, this paper proposes a food image recognition method that integrates multi-scale features and an attention mechanism. First, the ConvNeXt model, which has stronger feature extraction capabilities, is used as the backbone network to better capture the detailed features of food images. Next, an improved ASPP module is introduced to expand the receptive field and utilize multi-scale information, enhancing the model′s ability to capture features at different scales. Finally, an attention mechanism is added after each convolutional block to improve feature representation and the ability to capture contextual information. Experimental results show that the proposed method achieves accuracies of 91.56% and 87.22% on the extended Vireo Food172 dataset and the ETH Food101 dataset, respectively, which represents an improvement of 2.05% and 1.66% over the original model, thus verifying the effectiveness of the proposed method.

    参考文献
    相似文献
    引证文献
引用本文

李苗苗,华才健,谢涛,薛青霞.融合多尺度特征及注意力机制的食品图像识别[J].电子测量技术,2024,47(18):164-171

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-12-20
  • 出版日期:
文章二维码
×
《电子测量技术》
财务封账不开票通知