Advanced Mel-GAN for Speech Synthesis by Improving Receptive Field for Speech Generation

Geeta Atkar,  Dr.Priyadarhini J.

Geeta Atkar, Dr.Priyadarhini J.

Keywords: Generative Adversarial Networks, Text To Speech, Advanced Mel-GAN.

Abstract

Speech Synthesis is nothing but generation of multiple speeches, if input speeches are provided from databases to speech synthesis technique; it gives multiple speeches of same speech for which input is provided. While considering GAN, Mel-GAN gives best accuracy as compared with Wave-GAN, Wave-Net. Still there are some drawbacks are there for Mel-GAN. Therefore, this paper proposes advanced or modified Mel-GAN, which can give best quality of speech as compared to original Mel-GAN. Two Modifications are made like first we increase receptive field for speech generation. Second is instead of taking feature matching loss, we substitute Spectral Loss for Training (STFT) to find out difference between real and fake data. Here Mel spectrogram takes inputs and generates sub Band signals and at last combined back to generate full complete band signals. The proposed Advanced Mel-GAN has achieved Minimum Optimum Score as 4.34.

Advanced Mel-GAN for Speech Synthesis by Improving Receptive Field for Speech Generation

Abstract

Downloads

Information and Guidelines

Advanced Mel-GAN for Speech Synthesis by Improving Receptive Field for Speech Generation

Abstract

Downloads

Information and Guidelines

Subscribe