基于低成本FPGA的深度卷积神经网络加速器设计

首页 > 过刊浏览>2024年第47卷第10期 >184-190

基于低成本FPGA的深度卷积神经网络加速器设计
DOI:
                        
                    
CSTR:
                        [cstr]
                    
作者:
                        杨统杨统
合肥工业大学微电子学院 合肥 230601
在期刊界中查找
在百度中查找
在本站中查找
肖昊肖昊
合肥工业大学微电子学院 合肥 230601
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:合肥工业大学微电子学院 合肥 230601
作者简介:
通讯作者:
中图分类号:TN46
基金项目:国家自然科学基金（61974039）项目资助

Design of deep convolutional neural network accelerator based on low-cost FPGA

Author:

Yang Tong
Yang Tong
School of Microelectronics, Hefei University of Technology，Hefei 230601, China
在期刊界中查找
在百度中查找
在本站中查找
Xiao Hao
Xiao Hao
School of Microelectronics, Hefei University of Technology，Hefei 230601, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

School of Microelectronics, Hefei University of Technology，Hefei 230601, China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

现有的深度卷积神经网络在推理过程中产生大量的层间特征数据。为了在嵌入式系统中保持实时处理，需要大量的片上存储来缓存层间特征映射。本文提出了一种层间特征压缩技术，以显著降低片外存储器访问带宽。此外，本文针对FPGA中BRAM的特点提出了一种通用性的卷积计算方案，并从电路层面做出了优化，既减少了访存次数又提高了DSP的计算效率，从而大幅提高了计算速度。与CPU运行MobileNetV2相比，文章提出的深度卷积神经网络加速器在性能上提升了6.3倍；与同类型的DCNN加速器相比，文章提出的DCNN加速器在DSP性能效率上分别提升了17%和156%。

关键词:深度卷积神经网络;现场可编程门阵列;深度学习

Abstract:

Existing DCNN generate a large amount of inter-layer feature data during inference. To maintain real-time processing on embedded systems, a significant amount of onchip storage is required to cache inter-layer feature maps. This paper proposes an inter-layer feature compression technique to significantly reduce off-chip memory access bandwidth. Additionally, a generic convolution computation scheme tailored for BRAM in FPGA is proposed, with optimizations made at the circuit level to reduce memory accesses and improve DSP computational efficiency, thereby greatly enhancing computation speed. Compared to running MobileNetV2 on a CPU, the proposed DCNN accelerator in this paper achieves a performance improvement of 6.3 times; compared to other DCNN accelerators of the same type, the proposed DCNN accelerator in this paper achieves DSP performance efficiency improvements of 17% and 156%, respectively.

Key words:deep convolutional neural network;field programmable gate array;deep learning

引用本文

杨统,肖昊.基于低成本FPGA的深度卷积神经网络加速器设计[J].电子测量技术,2024,47(10):184-190

复制

文章指标

点击次数:76
下载次数: 3421
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2024-09-12
出版日期:

网站首页

杂志简介

过刊浏览

投稿须知

欢迎订阅

联系我们

English

引用本文

分享

文章指标

历史

文章二维码

网站首页

杂志简介

过刊浏览

投稿须知

欢迎订阅

联系我们

English

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码