

检索式图像自动评论(Search-based Automatic Image Commenting)

[1]- Predicting Viewer Affective Comments Based on Image Content in Social Media (ICMR, 2014) National Taiwan University, Chen et al.
[2]- Assistive Image Comment Robot—A Novel Mid-Level Concept-Based Representation (TAC, 2015) FX Palo Alto Laboratory, Chen et al.

图1 情感相关模型及其应用

如下图所示,自动评论能够较好贴合图像内容,但 (c ), (d)中的自动评论明显与图像不符,如出现错误的目标和动作等。

图2 自动评论结果示例

[3]- Object-Based Visual Sentiment Concept Analysis and Application (MM, 2014) Columbia University, Chen et al.

图3 基于目标检测的自动评论生成

图4 自动生成评论效果对比

[4-1]- Share-and-Chat: Achieving Human-Level Video Commenting by Search and Multi-View Embedding (MM, 2016) Sun Yat-sen University, Li et al.
[4-2]- Video ChatBot: Triggering Live Social Interactions by Automatic Video Commenting∗ (MM, 2016) Sun Yat-sen University, Li et al.

图5 Share and Chat 方法流程图

[5]- See and chat: automatically generating viewer-level comments on images (Multimedia Tools and Applications, 2019) Sun Yat-sen University, Chen et al.

Chen等[5]首先使用CNN获取图像表征信息,然后使用KNN,根据这些特征信息筛选出与测试图像相似的图像,然后使用Ranking典型相关分析(RCCA)对候选评论进行排序,如下图所示。使用 Flickr API构建数据集,并从图文相关性、评论感情强度和评论长度等方面对数据进行后处理。 数据集划分比例为:400K, 25K, 1K张图像。
图6 See and Chat 方法流程图

生成式图像自动评论(Generative Automatic Image Commenting)

[6]- Auto Image Comment via Deep Attention (ICIVC, 2017) Jiangxi Normal University , Shi et al.

图7 生成式图像评论生成

[7]- Neural Visual Social Comment on Image-Text Content (IETE Technical Review, 2020) Shanghai University, Yin et al.

图8 基于主题分类模型的生成式评论
[8]- Explainable Outfit Recommendation with Joint Outfit Matching and Comment Generation (TKDE, 2020) Shandong University, Lin et al.


图9 服装评论生成(a)图像特征提取(b)互注意力机制(c)解码器生成评论

[9]- An Image Comment Method Based on Emotion Capture Module (ICFTIC, 2021) Beihang University, Li et al.

Li等[9]首先使用 GAN 生成图像描述,然后使用文本风格迁移与文本改写间接生成评论。首先借鉴现有图像描述数据集,使用文本编辑方法打造图像评论数据集 。然后将目标域设置成评论数据库,学习评论的语言风格,通过对描述进行改写生成评论,如下图所示。
图10 基于文本改写的图像评论生成

视频弹幕自动生成(Automatic Live Video Commenting)


[10]- LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts (AAAI, 2019) Beijing University, Ma et al.

本文出自北大孙栩老师课题组,是第一篇提出视频弹幕生成这一任务的文章。Ma等提出两个处理此任务的baseline模型,分别是:层级结构的Fusional RNN 和线性结构的 Unified Transformer,如下图所示。


图11 两种 baseline 模型

[11]- VideoIC: A Video Interactive Comments Dataset and Multimodal Multitask Learning for Comments Generation (MM, 2020) Renmin University of China, Wang et al.

本文出自中国人民大学金琴老师团队,采用多任务学习方法,使用 Transformer 和 LSTM 分别提取图像的局部和全局特征;使用 Bi-LSTM 提取文本特征;送入基于 Transformer 的编码器中进行多模态特征整合,然后分别计算生成损失与上下文判别损失,整体框架图如下。



[12]- PLVCG: A Pretraining Based Model for Live Video Comment Generation (PAKDD, 2021) Chinese Academiy of Sciences, Zeng et al.

[13]- Knowing Where and What to Write in Automated Live Video Comments: A Unified Multi-Task Approach (ICMI, 2021) University College Dublin, Wu et al.

[14]- Sending or not? A multimodal framework for Danmaku comment prediction (IPM, 2021) Chinese Academiy of Sciences, Xi et al.

[15]- CMVCG: Non-autoregressive Conditional Masked Live Video Comments Generation Model (IJCNN, 2021) Chinese Academiy of Sciences, Zeng et al.



[16]- Automatic Generation of Personalized Comment Based on User Profile (ACL Student Workshops, 2019) Peking University, Zeng et al.

[17]- MALCOM: Generating Malicious Comments to Attack Neural Fake News Detection Models (ICDM, 2020) The Pennsylvania State University, Le et al.




  • [1] Y.Y. Chen, et al.Predicting Viewer Affective Comments Based on Image Content in Social Media, ICMR, 2014.
  • [2] Y.Y.Chen, et al. Assistive Image Comment Robot—A Novel Mid-Level Concept-Based Representation, IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (CCF-B), 2015.
  • [3] T. Chen, et al. Object-Based Visual Sentiment Concept Analysis and Application, ACM Multimedia, 2014.
  • [4] Li et al. Share-and-Chat: Achieving Human-Level Video Commenting by Search and Multi-View Embedding. ACMMM, 2016.
  • [5] J.W. Chen, et al. See and chat: automatically generating viewer-level comments on images. Multimedia Tools and Applications, 2019.
  • [6] J.H. Shi, et al. Auto Image Comment via Deep Attention. IEEE 4th International Conference on Image, Vision and Computing (ICIVC), 2017.
  • [7] Y. Yin, et al. Neural Visual Social Comment on Image-Text Content, IETE Technical Review, 2020.
  • [8] Y.J. Lin, et al. Explainable Outfit Recommendation with Joint Outfit Matching and Comment Generation. TKDE, 2020.
  • [9] Q. Li, J. Yin and Y. Wang, An Image Comment Method Based on Emotion Capture Module, 2021 IEEE 3rd International Conference on Frontiers Technology of Information and Computer (ICFTIC), 2021, pp. 334-339.
  • [10] Ma et al. LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts, AAAI, 2019.
  • [11] Wang et al. VideoIC: A Video Interactive Comments Dataset and Multimodal Multitask Learning for Comments Generation, MM, 2020.
  • [12] Zeng et al. PLVCG: A Pretraining Based Model for Live Video Comment Generation, PAKDD, 2021.
  • [13] Wu et al. Knowing Where and What to Write in Automated Live Video Comments: A Unified Multi-Task Approach, ICMI, 2021.
  • [14] Xi et al. Sending or not? A multimodal framework for Danmaku comment prediction, IPM, 2021.
  • [15] Zeng et al. CMVCG: Non-autoregressive Conditional Masked Live Video Comments Generation Model, IJCNN, 2021.
  • [16] Zeng et al. Automatic Generation of Personalized Comment Based on User Profile, ACL Student Workshops, 2019.
  • [17] Le et al. MALCOM: Generating Malicious Comments to Attack Neural Fake News Detection Models, ICDM, 2020.

