基于 Multi-WHFPN 与 SimAM 注意力机制的版面分割
DOI:
CSTR:
作者:
作者单位:

1.南京林业大学信息科学技术学院 南京 210037; 2.南京兰台信息技术有限公司 南京 210009

作者简介:

通讯作者:

中图分类号:

TP391.41

基金项目:

国家重点研发计划(2016YFD0600101)项目资助


Layout segmentation based on Multi-WHFPN and SimAM attention mechanism
Author:
Affiliation:

1.College of Information Science and Technology, Nanjing Forestry University,Nanjing 210037,China; 2.Nanjing Lantai Information Technology Co., Ltd., Nanjing 210009,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    作为OCR的预处理工作,版面分割技术越来越受到学术界和工业界重视。针对版面分割中遇到的检测速度慢、目标区域边界不准确以及细小目标易遗漏等问题,提出了YOLOv7-MSY模型。此模型首先借鉴残差连接思想,提出了Multi-WHFPN网络结构。它采用可训练的权重参数,突出特征融合过程中特征重要性,并添加了小目标检测头,从而提升对小目标的检测性能;其次,引入SimAM注意力机制,可以在不增加额外参数的基础上在3D维度评估特征权重,以增强重要特征,抑制无效特征;最后,使用YEIOU来代替原模型中的定位损失函数,提升了模型的收敛速度与回归精度。在江苏省档案馆提供的数据集上进行实验对比,YOLOv7-MSY对目标区域边界检测更加敏感,对细小目标的检测效果更好。YOLOv7-MSY 的mAP@.5达到了0.871,相较于原YOLOv7模型提高了7.84%。该模型的版面分割的效果优于其他类型的版面分割算法,具有良好的泛化性能,并且版面分割速度处于较高水平。

    Abstract:

    As a pre-processing step for OCR, the layout segmentation technology is receiving increasing attention from both academic and industrial communities. To address the problems encountered in layout segmentation, such as slow detection speed, inaccurate boundary detection of target areas, and easy omission of small targets, the YOLOv7-MSY model is proposed. Firstly, the Multi-WHFPN network structure is proposed by combining the idea of residual connection, and trainable weighted parameters are introduced to highlight the importance of features and add a small target detection head to enhance small target detection. Secondly, the SimAM attention mechanism is introduced to evaluate feature weights in the 3D dimension without adding extra parameters, to enhance important features and suppress invalid features. Finally, the YEIOU is used to replace the original model′s localization loss function, which improves the convergence speed and regression accuracy of the model. Experimental comparisons on the dataset provided by the Jiangsu Provincial Archives show that YOLOv7-MSY is more sensitive to boundary detection of target areas and performs better in detecting small targets. The mAP@.5 of YOLOv7-MSY reaches 0.871, which is 7.84% higher than the original YOLOv7 model. The layout segmentation effect of this model is superior to other types of layout segmentation algorithms. It has good generalization performance,and the layout segmentation speed is relatively high.

    参考文献
    相似文献
    引证文献
引用本文

杨陈慧,周小亮,张恒,孙政,业宁.基于 Multi-WHFPN 与 SimAM 注意力机制的版面分割[J].电子测量技术,2024,47(1):159-168

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-04-24
  • 出版日期:
文章二维码