Abstract:The recognition and positioning capabilities of the visual perception terminal of the fruitpicking robot system are crucial indicators to increase fruitpicking success rates in the complicated agricultural environment. A realtime multitask convolutional neural network SegYOLOv5 suited for autonomous Pitaya fruit image detection for the visual system of the picking robot was proposed in this paper using Pitaya fruit with complicated shape as the research object. The network is enhanced based on the primary architecture of YOLOv5′s convolutional neural network. The multitasking target recognition and detection task of image detection and semantic segmentation is realized, and the overall performance of the model is substantially improved, by extracting threelayer enhanced features as the input of the improved cascaded RFBNet semantic segmentation network layer. With a mean Average Precision and mean Intersection Over Union of 9310% and 8364%, respectively, for the testing dataset, the enhanced SegYOLOv5 network architecture can adapt to the boundary-sensitive image semantic segmentation agricultural scene, compared with YOLOv5s+original RFBNet and YOLOv5s+BaseNet models, it is 123% and 274% higher than the former, and 238% and 145% higher than the latter. The average detection speed of SegYOLOv5 can reach 7194 fps which is 4079 fps faster than EfficientDetD0, and the mean Average Precision is 58% higher. The center of mass of Pitaya fruit may be precisely positioned in real time as the best picking position using the endtoend output of SegYOLOv5 detection output and the fusion of image geometric moment operator. The improved algorithm has high robustness and versatility, which lays an effective practical foundation for fruit picking robot based on visual perception.