【CAVW】SCANET: 用稀疏和交叉注意力改进多模态表示和融合,用于多模态情感分析——CCF C

SCANET: Improving multimodal representation and fusion with sparse‐ and cross‐attention for multimodal sentiment analysis

Abstract
Learning unimodal representations and improving multimodal fusion are two cores of multimodal sentiment analysis (MSA). However, previous methods ignore the information differences between different modalities: Text modality has high‐order semantic features than other modalities. In this article, we propose a sparse‐ and cross‐attention (SCANET) framework which has asymmetric architecture to improve performance of multimodal representation and fusion. Specifically, in the unimodal representation stage, we use sparse attention to improve the representation efficiency of two modalities and reduce the low‐order redundant features of audio and visual modalities. In the multimodal fusion stage, we design an innovative asymmetric fusion module, which utilizes audio and visual modality information matrix as weights to strengthen the target text modality. We also introduce contrastive learning to effectively enhance complementary features between modalities. We apply SCANET on the CMU‐MOSI and CMU‐MOSEI datasets, and experimental results show that our proposed method achieves state‐of‐the‐art performance. We propose a sparse‐ and cross‐attention framework for multimodal sentiment analysis. First, we use sparse attention to improve the efficiency of representation learning. Then, we design an asymmetric fusion module which uses fused features as weights to reinforce the target modality. Further, we also introduce contrastive learning to efficiently enhance modality consistency and specificity information.

学习单模态表征和改善多模态融合是多模态情感分析(MSA)的两个核心。然而,以前的方法忽略了不同模态之间的信息差异: 文本模态比其他模态具有高阶语义特征。在这篇文章中,我们提出了一个稀疏和交叉注意力(SCANET)框架,它具有不对称的结构,以提高多模态表示和融合的性能。具体来说,在单模态表示阶段,我们使用稀疏注意力来提高两种模态的表示效率,减少音频和视觉模态的低阶冗余特征。在多模态融合阶段,我们设计了一个创新的非对称融合模块,它利用音频和视觉模态信息矩阵作为权重来加强目标文本模态。我们还引入了对比性学习,以有效地增强各模态间的互补特征。我们将SCANET应用于CMU-MOSI和CMU-MOSEI数据集,实验结果表明,我们提出的方法达到了最先进的性能。我们提出了一个用于多模态情感分析的稀疏和交叉注意力框架。首先,我们使用稀疏注意力来提高表示学习的效率。然后,我们设计了一个非对称融合模块,该模块使用融合的特征作为权重来加强目标模态。此外,我们还引入了对比性学习,以有效地提高模态的一致性和特异性信息。

阅读原文

猜你喜欢

转载自blog.csdn.net/lsttoy/article/details/130502210