Abstract:To address the low diagnostic accuracy of Alzheimer's Disease (AD) caused by the subtle complexity and spatial heterogeneity of brain lesions in Structural Magnetic Resonance Imaging (sMRI) of AD patients, a hybrid architecture that combines the strengths of Convolutional Neural Networks (CNN) and Transformers is proposed for the AD diagnosis. First, a multi-view feature encoder is designed, in which a view local feature extractor with integrated hybrid attention mechanisms is employed to extract complementary information from the coronal, sagittal, and axial views of sMRI. The semantic representation of lesion regions is further enhanced through a multi-view information interaction learning strategy. Second, a cascaded multi-scale fusion subnetwork is designed to progressively fuse multi-scale feature map information, enhancing discriminative ability. Finally, a Transformer encoder is used to model the global feature representation of full-brain sMRI. Results on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset show that the proposed method achieves classification accuracies of 94.05% for AD and 81.59% for Mild Cognitive Impairment (MCI) conversion prediction, outperforming several existing methods.