李子聪,曾宇航,熊晓明.基于SoC的卷积神经网络系统设计[J].电子测量技术,2019,42(10):126-131
基于SoC的卷积神经网络系统设计
Design of convolutional neural network system based on SoC
  
DOI:
中文关键词:  SoC  卷积神经网络  并行化  软硬件协同设计
英文关键词:SoC  convolutional neural network  deserialize  hardware software co-designed
基金项目:广东省科技项目(2017B090909004)资助
作者单位
李子聪 广东工业大学 自动化学院 广州 510006 
曾宇航 广东工业大学 自动化学院 广州 510006; 佛山芯珠微电子有限公司 佛山 528200 
熊晓明 广东工业大学 自动化学院 广州 510006 
AuthorInstitution
Li Zicong School of Automation, Guangdong University of Technology, Guangzhou 510006, China 
Zeng Yuhang School of Automation, Guangdong University of Technology, Guangzhou 510006, China; Chipeye Microelectronics foshan Ltd., Foshan 528200, China 
Xiong Xiaoming School of Automation, Guangdong University of Technology, Guangzhou 510006, China 
摘要点击次数: 621
全文下载次数: 444
中文摘要:
      近些年,卷积神经网络(CNN)出色地完成了许多机器视觉任务。但现有的软件实施方案无法很好地在便携式设备中实现,为此设计一种基于Xilinx 全可编程SoC的CNN系统,在固定资源的SoC平台下,只需较少资源即可实现快速的检测系统。系统实现多级流水线和输入数据复用的方法提高计算效率。系统硬件部分实现CNN计算,软件实现图片预处理及图片检测后处理,从而提高运行效率,系统可实现多种卷核的卷积操作,平均值池化,非极大值抑制抑制算法,实现图片中多人脸的准确定位。实验结果表明,在100 MHz的工作频率下,系统的平均计算速率为0.19 Gops/s,功耗仅为通用CPU的4.07%。
英文摘要:
      In recent years, convolutional neural networks have done a great job in many machine vision tasks. However, existing software implementations are not well implemented in portable devices. A convolutional neural network system based on Xilinx all-programmable SoC is designed to accelerate the convolutional operation in parallel, which only need few design resource and implement fast detection system. The system uses multi-stage pipeline technology and input data reuse to improve calculation efficiency. The hardware part completes convolutional network calculation, and the software part finish the image preprocessing and post-image detection preprocessing, thereby improving operation efficiency. The system can implements the convolution operation with different size, mean pooling operation and the non-maximum suppression algorithm, which achieves accurate positioning of multiple faces in the picture. The experimental results show that the average calculation rate of the system is 0.19 Gops/s at the operating frequency of 100 MHz,and the power consumption is only 4.07% of the general purpose CPU.
查看全文  查看/发表评论  下载PDF阅读器