Abstract:The completeness of key parts is an important verification requirement for gas meters. Although the traditional image feature matching method is used to realize the automation of part detection, its universality is poor. This paper proposes an improved method for Faster R-CNN to identify and locate key parts of gas meters from multiple perspectives. First, Faster R-CNN utilizes Vision Transformer (ViT) to replace the convolutional neural networks, whose self-attention mechanism can help to learn the correlation between image block features and strengthen the representation ability. And then the ViT structure with 14 Transformer layers and 12 self-attention heads is optimized to achieve optimal accuracy. Experimental results show that the mAP of the optimal model is 86.71%, 2.48% higher than that of ResNet50. It is equivalent to the detection accuracy of ResNet101, whose detection efficiency is increased by 5.8%, and effectively reduces the complexity of the model. It takes 1.13 s to accomplish the single detection of key parts of gas meter. The method balances the accuracy and real-time ability for key parts detection of gas meter.