基于FPGA的稀疏卷积神经网络加速器设计
DOI:
CSTR:
作者:
作者单位:

合肥工业大学微电子学院 合肥 230601

作者简介:

通讯作者:

中图分类号:

TP302

基金项目:

国家自然科学基金(61974039)项目资助


Accelerator design of sparse convolutional neural network based on FPGA
Author:
Affiliation:

School of Microelectronics, Hefei University of Technology,Hefei 230601, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    剪枝是一种减少卷积神经网络权重和计算量的有效方法,为CNN的高效部署提供了解决方案。但是,剪枝后的稀疏CNN中权重的不规则分布使硬件计算单元之间的计算负载各不相同,降低了硬件的计算效率。文章提出一种细粒度的CNN模型剪枝方法,该方法根据硬件加速器的架构将整体权重分成若干个局部权重组,并分别对每一组局部权重进行独立剪枝,得到的稀疏CNN在加速器上实现了计算负载平衡。此外,设计一种具有高效PE结构和稀疏度可配置的稀疏CNN加速器并在FPGA上实现,该加速器的高效PE结构提升了乘法器的吞吐率,同时可配置性使其可灵活地适应不同稀疏度的CNN计算。实验结果表明,提出的剪枝算法可将CNN的权重参数减少50%~70%,同时精度损失不到3%。相比于密集型加速器,提出的加速器最高可实现3.65倍的加速比;与其他的稀疏型加速器研究相比,本研究的加速器在硬件效率上提升28%~167%。

    Abstract:

    Pruning is an effective approach to reduce weight and computation of convolutional neural network, which provides a solution for the efficient implementation of CNN. However, the irregular distribution of weight in the pruned sparse CNN also makes the workloads among the hardware computing units different, which reduces the computing efficiency of the hardware. In this paper, a fine-grained CNN model pruning method is proposed, which divides the overall weight into several local weight groups according to the architecture of the hardware accelerator. Then each group of local weights is pruned independently respectively, and the sparse CNN obtained is workload-balancing on the accelerator. Furthermore, a sparse CNN accelerator with efficient PE and configurable sparsity is designed and implemented on FPGA. The efficient PE improves the throughput of the multiplier, and the configurability makes it flexible to compute CNN with different sparsity. Experimental results show that the presented pruning algorithm can reduce the weight parameters of CNN by 70% and the accuracy loss is less than 3%. Compared to dense accelerator research, the accelerator proposed in this paper achieves up to 3.65x speedup. The accelerator improves the hardware efficiency by 28~167% compared with other sparse accelerators.

    参考文献
    相似文献
    引证文献
引用本文

李宁,肖昊.基于FPGA的稀疏卷积神经网络加速器设计[J].电子测量技术,2024,47(5):1-8

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-06-05
  • 出版日期:
文章二维码