Abstract:In recent years, convolutional neural networks have done a great job in many machine vision tasks. However, existing software implementations are not well implemented in portable devices. A convolutional neural network system based on Xilinx all-programmable SoC is designed to accelerate the convolutional operation in parallel, which only need few design resource and implement fast detection system. The system uses multi-stage pipeline technology and input data reuse to improve calculation efficiency. The hardware part completes convolutional network calculation, and the software part finish the image preprocessing and post-image detection preprocessing, thereby improving operation efficiency. The system can implements the convolution operation with different size, mean pooling operation and the non-maximum suppression algorithm, which achieves accurate positioning of multiple faces in the picture. The experimental results show that the average calculation rate of the system is 0.19 Gops/s at the operating frequency of 100 MHz,and the power consumption is only 4.07% of the general purpose CPU.