Advanced Mel-GAN for Speech Synthesis by Improving Receptive Field for Speech Generation

  • Geeta Atkar, Dr.Priyadarhini J.
Keywords: Generative Adversarial Networks, Text To Speech, Advanced Mel-GAN.

Abstract

Speech Synthesis is nothing but generation of multiple speeches, if input speeches are provided from databases to speech synthesis technique; it gives multiple speeches of same speech for which input is provided. While considering GAN, Mel-GAN gives best accuracy as compared with Wave-GAN, Wave-Net. Still there are some drawbacks are there for Mel-GAN. Therefore, this paper proposes advanced or modified Mel-GAN, which can give best quality of speech as compared to original Mel-GAN. Two Modifications are made like first we increase receptive field for speech generation. Second is instead of taking feature matching loss, we substitute Spectral Loss for Training (STFT) to find out difference between real and fake data. Here Mel spectrogram takes inputs and generates sub Band signals and at last combined  back to generate full complete band signals. The proposed Advanced Mel-GAN has achieved Minimum Optimum Score as 4.34.

Published
2021-09-14
How to Cite
Dr.Priyadarhini J., G. A. (2021). Advanced Mel-GAN for Speech Synthesis by Improving Receptive Field for Speech Generation. Design Engineering, 11527-11534. Retrieved from http://www.thedesignengineering.com/index.php/DE/article/view/4292
Section
Articles