语音合成vocoder（五） synthesis - 代码天地

语音合成vocoder（五） synthesis

其他 2018-05-30 16:57:34 阅读次数: 0

基本概念

最小相位脉冲响应[1]可以保证波形在时域上基本不变。
根据频谱包络求出最小相位响应（减弱时域信号的相位失真），然后IFFT还原为语音信号
这里写图片描述
其中 $A$ 跟频谱包络有关

合成流程

合成[2]分为三步
1. 根据 $f_0$ 确定脉冲的位置
对分帧的频谱插值获得脉冲对应的频谱 $spectrum$
2. 时域周期信号
2.1 求出频谱包络中周期部分，然后过最小相位脉冲响应

p e r i o d_s p e c t r u m = s p e c t r u m [i] \cdot (1 - a p e r i o d i c_r a t i o [i])

$period\_spectrum=spectrum[i] \cdot (1-aperiodic\_ratio[i])$
2.2 反傅里叶变换得到时域信号，并去除直流分量

R e m o v e D C (I F F T (p e r i o d_s p e c t r u m))

$RemoveDC(IFFT(period\_spectrum))$
3. 时域非周期信号
3.1 求出频谱包络中非周期部分，然后过最小相位脉冲响应

a p e r i o d_s p e c t r u m = s p e c t r u m [i] \cdot a p e r i o d i c_r a t i o [i]

$aperiod\_spectrum=spectrum[i] \cdot aperiodic\_ratio[i]$
3.2 高斯白噪声的幅度谱

n o i s e_s p e c t r u m = F F T (n o i s e)

$noise\_spectrum=FFT(noise)$
3.3 获取最后的非周期时域信号

I F F T (a p e r i o d_s p e c t r u m \cdot n o i s e_s p e c t r u m)

$IFFT(aperiod\_spectrum\cdot noise\_spectrum)$
4. 时域周期信号和时域非周期信号相加得到最后的合成信号

代码细节

GetTimeBase()
synth->interpolated_vuv：voice的概率
synth->pulse_locations：脉冲对应的时间
synth->pulse_locations_index：脉冲对应的采样点
1.线性插值的到每个sample对应的f0和vuv
原来5ms对应一帧，有一个f0，扩展到每一个采样点一个f0，直接线性插值
2.获取pulse location，参考wiki
每个采样点对应的 $phase=\frac{2\pi f_0}{f_s}$
每个采样点对应的累积相位
每个采样点对应的 $wrap\_phase=total\_phase\ mod\ 2\pi$
每个采样点对应的 $wrap\_phase[n] - wrap\_phase[n-1]>\pi$ 即为新的pulse的起点

参考文献

[1].speech representation and transformation using adaptive interpolation of weighted spectrum vocoder revisited
[2].https://github.com/mmorise/World

猜你喜欢

转载自blog.csdn.net/xmdxcsj/article/details/72420382

语音合成vocoder（五） synthesis

语音合成（speech synthesis)资料整理

synthesis

Speech Synthesis(文字转语音)

HTML5语音合成Speech Synthesis API简介

System.Speech.Synthesis 保存合成语音

语音合成（speech synthesis）方向八：韵律迁移和建模

语音合成（speech synthesis）方向七：脑机接口之基于脑电图语音合成

语音合成论文优选：脑机接口的语音合成Advancing Speech Synthesis using EEG

语音合成论文优选：使用脑电图来进行语音合成speech synthesis using eeg

新视角合成 (Novel View Synthesis)

语音合成vocoder（一）概况

Speech Synthesis，语音合成详解——语音信号处理学习（八）

语音合成论文优选：流式语音合成High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency

语音合成论文优选：Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis

语音合成vocoder（二）基频参数

语音合成vocoder（四） aperiodicity参数

语音合成vocoder（三） spectral envelope参数

纹理合成 Texture Synthesis 算法的C++实现

Featuretools--深度特征合成 Deep Feature Synthesis原理

WORLD声码器:A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications

Speech Synthesis

辅助分类器的条件图像合成AC-GAN（Conditional Image Synthesis with Auxiliary Classifier GANs）

StarGAN v2: Diverse Image Synthesis for Multiple Domains （多域多样性图像合成）

【多模态图像合成与编辑】Multimodal Image Synthesis and Editing:A Survey（综述）

Diffusion语义图像合成(Semantic Image Synthesis)部分论文汇总

语音神经科学—02.Speech synthesis from neural decoding of spoken sentences

【GAN ZOO】Precomputed Realtime Texture Synthesis with Markovian Generative Adversarial 用MGAN预训练实时纹理合成

卷积神经网络图像纹理合成 Texture Synthesis Using Convolutional Neural Networks－－－学习

人脸三维建模A Morphable Model For The Synthesis Of 3D Faces（三维人脸合成的变形模型）

今日推荐

周排行

深度学习------Lingvo框架下的加速通道GPipe

webjars管理静态资源

C专家编程_2.2

mysql 源码安装

json文件操作

123231432

注解的实现

Spring MVC 控制器

《人月神话》读后感二

C#使用HttpWebRequest和HttpWebResponse上传文件示例

每日归档

更多

2024-09-08(0)

2024-09-07(0)

2024-09-06(0)

2024-09-05(0)

2024-09-04(0)

2024-09-03(0)

2024-09-02(0)

2024-09-01(0)

2024-08-31(0)

2024-08-30(0)