用于骨架行为识别的时空卷积Transformer网络
DOI:
CSTR:
作者:
作者单位:

1.郑州恒达智控科技股份有限公司 郑州 450000; 2.河南理工大学电气工程与自动化学院 焦作 454003; 3.北京航空航天大学人工智能研究院 北京 100191

作者简介:

通讯作者:

中图分类号:

TP391.4

基金项目:

国家自然科学基金(61972016)项目资助


Spatial temporal convolutional Transformer network for skeleton-based action recognition
Author:
Affiliation:

1.Zhengzhou Hengda Intelligent Control Technology Company Limited,Zhengzhou 450000, China; 2.School of Electrical Engineering and Automation, Henan Polytechnic University,Jiaozuo 454003, China; 3.Research Institute for Artificial Intelligence, Beihang University,Beijing 100191, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对基于图卷积的骨架行为识别方法在建模关节特征时严重依赖手工设计图形拓扑,缺乏建模全局关节间依赖关系的缺点,设计了一种时空卷积Transformer实现对空间和时间关节特征的建模。空间关节特征建模中,提出一种动态分组解耦Transformer,通过将输入骨架序列在通道维度进行分组并为每个组动态生成不同的注意力矩阵,允许建模关节之间的全局空间依赖关系,无需事先知道人体拓扑结构。时间关节特征建模中,通过多尺度时间卷积实现对不同时间尺度行为特征的提取。最后,提出一种时空-通道联合注意力模块,进一步对所提取到的时空特征进行修正。在NTU-RGB+D和NTU-RGB+D 120数据集的跨主体评估标准上达到了92.5%和89.3%的Top1识别准确率,实验结果表明了所提方法的有效性。

    Abstract:

    In the methon of skeleton action recognition based on graph convolution, the rely heavily on hand-designed graph topology in modelling joint features, and lack the ability to model global joint dependencies. To address this issue, we proposed a spatio-temporal convolutional Transformer network to implement the modelling of spatial and temporal joint features. In the spatial joint feature modeling, we proposed a dynamic grouping decoupling Transformer that grouped the input skeleton sequence in the channel dimension and dynamically generated different attention matrices for each group, establishing global dependencies between joints without requiring knowledge of the human topology. In the temporal joint feature modeling, multi-scale temporal convolution was used to extract features of target behaviors at different scales. Finally, we proposed a spatio-temporal channel joint attention module to further refine the extracted spatio-temporal features. The proposed method achieved Top1 recognition accuracy rates of 92.5% and 89.3% on the cross-subject evaluation criteria for the NTU-RGB+D and NTU-RGB+D 120 datasets, respectively, demonstrating its effectiveness.

    参考文献
    相似文献
    引证文献
引用本文

刘斌斌,赵宏涛,王田,杨艺.用于骨架行为识别的时空卷积Transformer网络[J].电子测量技术,2024,47(1):169-177

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-04-24
  • 出版日期:
文章二维码