Automatic speaker recognition with variation across vocal conditions: a controlled experiment with implications for forensics

Jessica Wormald, Paul Foulkes, Philip Harrison, Vincent Hughes, Finnian Kelly, David van der Vloed, Poppy Welch, Chenzi Xu

August 2023

Project INTERSPEECH 2023 PDF Code

Abstract

Automatic Speaker Recognition (ASR) involves a complex range of processes to extract, model, and compare speaker-specific information from a pair of voice samples. Using heavily controlled recordings, this paper explores the impact of specific vocal conditions (i.e. vocal setting, disguise, accent guises) on ASR performance. When vocal conditions are matched, ASR performance is generally excellent (whisper is an exception). When conditions are mismatched, as in most forensic cases, we see an increase in discrimination and calibration error in some cases. The most problematic mismatches are those involving whisper and supralaryngeal vocal settings; these produce the greatest phonetic changes to speech. Mismatches involving high pitch also produce poor performance, although this appears to be driven by speaker-specific differences in articulatory implementation. We discuss the implications of the findings for the use of ASR in forensic casework and the interpretability of system output.

Type

Conference paper

Publication

Proceedings of INTERSPEECH. Dublin, Ireland. pp. 591-595

Automatic speaker recognition with variation across vocal conditions: a controlled experiment with implications for forensics

Abstract

Related