Exploring individual speaker characteristics within a forensic automatic speaker recognition system

Vincent Hughes, Chenzi Xu, Paul Foulkes, Philip Harrison, Poppy Welch, Jessica Wormald, Finnian Kelly, David van der Vloed

April 2024

Odyssey 2024 PDF Project Poster

Abstract

A key issue for automatic speaker recognition (ASR), particularly for forensics, is our lack of understanding about why certain voices prove more or less of a challenge for systems. In this paper, we focus on variability in individual speaker performance within an x-vector ASR system and examine this variability as a function of the phonetic content within speech samples. The inclusion of vowels generally improved performance, but not for all speakers. Indeed, some speakers produced broadly the same Cllr irrespective of the phonetic content in the speech samples. Poor ASR performance was not well correlated with long-term laryngeal features (f0 and laryngeal voice quality) and these features may provide additional speaker discriminatory information for some speakers. We discuss the implications of these findings in terms of developing a speaker quality metric for flagging potentially problematic speakers prior to ASR comparison.

Type

Conference paper

Publication

Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2024). 18-21 June 2024. Quebec City, Canada. pp. 1-8

Exploring individual speaker characteristics within a forensic automatic speaker recognition system

Abstract

Related