A Speaker Diarization and Transcription method for Overlapped Speech and Voice Change Detection Multichannel Audio

  • Gangadhara Rao Kommu, Maddipatla Mukta, Hemanth Kakarla
Keywords: Speaker diarization, d-vector embedding, overlapped speech detection, multi-channel audio, clustering, UIS-RNN, Transcription.

Abstract

In this paper, we are trying to implement a novel speaker diarization and transcription approach for multi channel audio which is fully supervised, we use an unbounded interleaved-state recurrent neural networks (UISRNN). From input of multiple audio streams , having extracted speaker- distinctive embeddings also referred to as d-vectors, every speaker has been matched by an RNN which is implemented using shared parameters, while the RNN states for various speakers intersperse inside the domain of time. The Chinese restaurant process (ddCRP) has been interspersed into this RNN which is a process dependent on distance to facilitate an undefined amount of speakers. This approach has been made fully supervised to find out from examples in which labels are annotated consistently with timestamps. Our method includes overlapped speech detection and speaker change detection which largely impacts the speaker diarization process. A DER (diarization error rate) of 7.2% has been achieved on the NIST SRE 2000 CALLHOME dataset, which is comparatively superior to the existing method which doesn’t consider overlapped speech.

Published
2021-07-01
How to Cite
Hemanth Kakarla, G. R. K. M. M. (2021). A Speaker Diarization and Transcription method for Overlapped Speech and Voice Change Detection Multichannel Audio. Design Engineering, 1167- 1180. Retrieved from http://www.thedesignengineering.com/index.php/DE/article/view/2420
Section
Articles