Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset - 代码天地

Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset

其他 2021-12-14 18:16:30 阅读次数: 0

会议：icassp 2021
作者：Kun Zhou，lihaizhou

文章目录

abstract

emotion vc是转换source中的情感韵律，不改变说话人和文本内容。之前的工作证明encoder-decoder结构可以在emotion label的标记下解耦情感信息；
本文提出一个 VAW-GAN： auto-encoding Wasserstein generative adversarial network，使用一个预先训练好的speech emotion recognition model提取emotion style，这样就可以进行seen和unseen的情感转换。
本文也发布了一个包含多语种，多说话人的情感数据库。

1. introduction

emotion-vc-id成功的尝试了VAW-GAN + emotion id(one-hot)进行风格控制，但是只用id表示情绪风格过于单一，因为情绪是由多种因素共同影响的。

2. Analysis of Deep Emotional Features

Emotional prosody 可以用离散的标签表示：比如Ekmans’s 六类基本情绪，也可以用连续的向量表示：Russell’s circumplex model；
本文用连续的空间表示情绪，可以完成one-to-many的情感控制；

挑了四个人（2男2女）相同内容、不同情绪的句子提取deep emotional features，画tsne图，可以看到各个情绪类之间有明显的区别；

3. ONE-TO-MANY Emotional style transfer

3.1. StageI : Emotion Descriptor Training

使用一个nn对输入的句子进行情感分类，提取出句子级的向量表示；
$\Phi = D(X)$

3.2. Stage II: Encoder-Decoder Training with VAW-GAN

在这里插入图片描述

-emotion style:是reference set的向量均值；

在这里插入图片描述

4. experiment & results

demo效果听起来和baseline的区别不大，neural-to-angry的情绪更好一些；

猜你喜欢

转载自blog.csdn.net/qq_40168949/article/details/114285305

Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Seq2Seq Train

Converting Anyone’s Emotion:Towards Speaker-Independent Emotional Voice Conversion

Emotional Chatting Machine: Emotional Conversation

MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms

nlp_Emotional_analysis

SLT2021: VAW-GAN FOR DISENTANGLEMENT AND RECOMPOSITION OF EMOTIONAL ELEMENTS IN SPEECH

Transferring Source Style in Non-Parallel Voice Conversion

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis

The gift and power of emotional courage-part2

The gift and power of emotional courage-part1

（ICASSP 19）SEMI-SUPERVISED AND POPULATION BASED TRAINING FOR VOICE COMMANDS（Speech Commands Dataset）

论文笔记：Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory

[CSP-S模拟测试]:Emotional Flutter（贪心） [CSP-S模拟测试]:Emotional Flutter（贪心）

论文翻译-Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory

GST--Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

The Voice Conversion Challenge 2018

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss 复现one-hot embedding版本

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss 优化调整方案

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss代码调试过程

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss笔记

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech

Emotional Mastery Mini-Story Text 1 And 2

[CSP-S模拟测试]:Emotional Flutter（贪心）

论文阅读：Audio-Driven Emotional Video Portraits

【一】情感对话 Towards Emotional Support Dialog Systems 论文阅读

TTS | emotional-vits情绪语音合成的实现

论文阅读笔记：Seen to Unseen Exploring Compositional Generalization of Multi-Attribute Controllable Dialogu

[Style Transfer]——Deep Photo Style Transfer

今日推荐

周排行

深度学习------Lingvo框架下的加速通道GPipe

webjars管理静态资源

C专家编程_2.2

mysql 源码安装

json文件操作

123231432

注解的实现

Spring MVC 控制器

《人月神话》读后感二

C#使用HttpWebRequest和HttpWebResponse上传文件示例

每日归档

更多

2024-09-08(0)

2024-09-07(0)

2024-09-06(0)

2024-09-05(0)

2024-09-04(0)

2024-09-03(0)

2024-09-02(0)

2024-09-01(0)

2024-08-31(0)

2024-08-30(0)