A Speaker Diarization and Transcription method for Overlapped Speech and Voice Change Detection Multichannel Audio

Gangadhara Rao Kommu, Maddipatla Mukta, Hemanth Kakarla

Gangadhara Rao Kommu, Maddipatla Mukta, Hemanth Kakarla

Keywords: Speaker diarization, d-vector embedding, overlapped speech detection, multi-channel audio, clustering, UIS-RNN, Transcription.

Abstract

In this paper, we are trying to implement a novel speaker diarization and transcription approach for multi channel audio which is fully supervised, we use an unbounded interleaved-state recurrent neural networks (UISRNN). From input of multiple audio streams , having extracted speaker- distinctive embeddings also referred to as d-vectors, every speaker has been matched by an RNN which is implemented using shared parameters, while the RNN states for various speakers intersperse inside the domain of time. The Chinese restaurant process (ddCRP) has been interspersed into this RNN which is a process dependent on distance to facilitate an undefined amount of speakers. This approach has been made fully supervised to find out from examples in which labels are annotated consistently with timestamps. Our method includes overlapped speech detection and speaker change detection which largely impacts the speaker diarization process. A DER (diarization error rate) of 7.2% has been achieved on the NIST SRE 2000 CALLHOME dataset, which is comparatively superior to the existing method which doesn’t consider overlapped speech.

A Speaker Diarization and Transcription method for Overlapped Speech and Voice Change Detection Multichannel Audio

Abstract

Downloads

Information and Guidelines

A Speaker Diarization and Transcription method for Overlapped Speech and Voice Change Detection Multichannel Audio

Abstract

Downloads

Information and Guidelines

Subscribe