Abstract:The fine-grained image recognition algorithm based on the ViT model has some problems, such as feature extraction is not comprehensive and parameter selection is not universal. To solve these problems, this paper presents an Adaptive Feature Extraction method with Indirect Attention (AFEIA). Firstly, to classify the characteristics of the object as most relevant, less relevant, and irrelevant, the improved natural breakpoint classification algorithm is used. This method can extract the most discriminative features adaptively for different input samples, which ensures the accuracy of feature extraction. Secondly, the attention weight matrix is used to obtain the features that are indirectly related to the object. This method acquires subtle differences between objects and ensures comprehensive feature extraction. Experiments show that the ViT model using the AFEIA method achieved 91.6% and 91.5% prediction accuracy on two fine-grained datasets CUB-200-2011, and Stanford Dogs, respectively. Visualization methods and ablation experiments verified the effectiveness of the AFEIA method.