Abstract:Metaphorical affective prediction can help improve the user experience of social media content, while also having potential value in mental health monitoring and virtual psychotherapy. In addition, it can more accurately identify the affective needs of the target audience, optimize advertising strategies, and improve business efficiency. In order to further enhance the effectiveness of metaphorical affective prediction, architecture on multi-mode metaphorical affective prediction method that consolidating intra-class difference and inter-class coherence is proposed. Firstly, three single-mode models are introduced, including image semantic model, text semantic model, and voice semantic model, to extract personalized differential features from three data sources, respectively. Then, a deep layering multi-mode model is introduced to learn the coherences between multiple modes through intermediate layer fusion, better utilizing the complementary information provided by bi-modal and tri-modal data. Finally, the four aforementioned models are fused using a decision-making layer fusion approach to predict multi-modal metaphorical feelings in an end-to-end architecture. Extensive ablation experiments and comparative studies conducted on open-source datasets have demonstrated the effectiveness of proposed approach.