How do infants identify their mothers by their voice: The role of temporal characteristics in infant-directed speech style

Recent studies showed that fetuses and infants can discriminate their mothers from strangers by their voice. How voice identification develops in humans before and after birth and which information in the speech signal is relevant is still mostly unknown. To find answers to these questions it is important to consider the deliberate changes in speaking-styles of speech produced by adults (in particular mothers) when addressing their babies (infant-directed speech; IDS). IDS is common across different languages and cultures and studies showed, that it increases infants’ attention and facilitates the acquisition of linguistic properties. However, what has been overlooked so far is another important information in speech - speaker specific or indexical information - which enables listeners to recognize speakers based on their voice. Here I argue that the design of IDS is particularly suitable to acquire speaker-specific information necessary for speaker recognition. The promising results of my earlier study on indexical properties of IDS in collaboration with the University of Quebec in Montreal and Michigan State University have shown that mothers expand their indexical cue space during IDS compared to adult-directed speech (ADS). This expansion, I argue, promotes the acquisition of indexical features of particular voices. I tested this hypothesis with automatic speaker recognition (ASR) systems and found that they perform better in IDS, concluding that IDS is more suitable for learning about a speaker’s voice individualities. The earlier experiment was carried out on segmental properties of speech which characterizes the vocal tract of speakers for every 20 milliseconds, ignoring temporal cues of speech which contain important indexical information. The goal of this proposal is thus to explore indexical properties of temporal cues in IDS in contrast to ADS using state-of-the-art ASR technology. This research is cross-disciplinary combining expertise in particular from Phonetics, Speech Engineering and Psychology. Results will have implications on the evolution of speech, in particular different speaking styles (here: IDS) and child language development.  Other implications are on the design of ASR and forensic voice analysis.

Kathiresan, T., Dilley, L., Townsend, S., Shi, R., Daum, M., Arjmandi, M. & Dellwo, V. (2019). Infant-directed speech enhances recognizability of individual mothers’ voices. Journal of the Acoustical Society of America, 145(3), 1766

