Speech samples are quick, easy, and inexpensive to collect, and speech-eliciting tasks can be administered remotely and frequently without much practice effects. Thus, speech holds significant potential for non-invasive and cost-effective screening and monitoring of various clinical populations, including but not limited to autism, ADHD, depression, anxiety, schizophrenia, Alzheimer’s disease, mild cognitive impairment, and related dementias. Speech prosody captures multidimensional aspects of cognition, affect, and motor control in quantifiable form, providing a scalable behavioral signal for early detection, disease monitoring, and treatment evaluation.
This session focuses on clinical and translational applications of prosody research, highlighting how prosodic analysis can inform diagnosis, treatment, and real-world health outcomes. This special session brings together researchers working on speech prosody in clinical contexts and aims to foster discussion on its roles and impact as a potential “biomarker” of specific conditions. We invite contributions that examine the feasibility, stability, and generalizability of speech prosody as biomarkers in clinical conditions, methodological innovations in measuring speech prosody in clinical populations, and the utility and practicality of using speech prosody as biomarkers. We encourage submissions that address these issues using a variety of approaches, ranging from traditional manual assessments of prosody to large acoustic models (e.g., Whisper, Wav2vec2.0) and multimodal models (e.g., Qwen2.5).
Topics:
Development and validation of prosody-based diagnostic or prognostic tool
Novel methods in assessing speech prosody in clinical populations
Tracking disease progression or treatment response using prosodic features
Integration of speech prosody with other digital biomarkers for multimodal processing
Association between speech prosody and other behavioral, fluid, neuroimaging biomarkers
Organizers:
Sunghye Cho, PhD, Research Assistant Professor, Linguistics | Linguistic Data Consortium, University of Pennsylvania
Min Seok Baek, MD, Assistant Professor, Neurology, Yonsei University Wonju College of Medicine
The ability to comprehend and produce speech prosody is key for successful communication and interpersonal relationships, which in turn are well-known determinants of quality of life and health. Injury to the brain, e.g., from a stroke or traumatic brain injury, can selectively impair different aspects of this crucial ability, depending on which parts of the brain are affected. A better understanding of the neural basis of different aspects of speech prosody, in addition to being of academic interest to those seeking to develop biologically plausible models, can provide clues regarding more precise prognostics and targeted rehabilitation approaches.
However, even for seemingly straightforward questions remain unsettled: Is prosody predominantly supported by the brain’s right hemisphere? Are linguistic and emotional aspects of prosody supported by overlapping or distinct mechanisms? To what extent is the neural substrate of prosody production intertwined with that of prosody perception? Findings from clinical and neuroscience studies are strikingly inconsistent, due in part to methodological and conceptual challenges in assessing prosodic function. Large interindividual and sociocultural variability in typical prosody, as well as the intricate coupling between prosodic cues and segmental structure, make it difficult to isolate prosodic deficits. Compounding this, different studies often operationalize “prosody” in non-comparable ways—ranging from intonational contours to lexical stress or tone contrasts—limiting synthesis across literatures.
This session aims to bring together linguists and neuroscientists to build a more unified understanding of the neural basis of speech prosody. We invite contributions that investigate:
The neural organization and dynamics of prosodic perception and production;
Prosodic impairments in acquired or developmental disorders as evidence for neural organization;
Neurobiologically and linguistically informed models linking structure, function, and behavior; and
Cross-linguistic or cross-domain comparisons that reveal how prosodic representations are instantiated in the brain.
Organizers:
Anna Seydell-Greenwald, PhD, Georgetown University Medical Center, Washington DC, USA
Tamar I. Regev, PhD, MIT, Cambridge, MA, USA.
Alexandra Zezinka Durfee, PhD CCC-SLP, Towson University, Towson, MD, USA.
Guide to Annotation with PoLaR for Beginners
PoLaR (Points Levels and Ranges) is a recently developed prosodic annotation system, and this tutorial will be useful for a wide range of prosodic annotators, from experienced labellers who are new to PoLaR to novices in prosodic annotation (but anyone is welcome to join!). The primary goal of this tutorial is to provide participants with hands-on experience in PoLaR-labelling speech data from mainstream American English, which will enable them to PoLaR-label data from a variety of different languages, sources and contexts.
PoLaR (see http://polarlabels.com), has been developed to facilitate the labelling of individual prosodic characteristics and cues (thereby decomposing phonological labels such as H* or L-L% into their more atomic phonological and acoustic components). In addition to isolating particular prosodic qualities, PoLaR is intended as a system that can be applied to describe any register, dialect, or language, including those dialects and languages that are under-studied. In fact, PoLaR may be useful for capturing and analyzing (potentially systematic) variability, both within and across various categories in phonological, semantic/pragmatic, and/or sociolinguistic domains.
Organizers:
Alejna Brugos, PhD, Research Fellow, Department of Computer, Data & Mathematical Sciences, Simmons University
Byron Ahn, PhD, Senior Data Scientist, Grid Dynamics International, Inc.
Nanette Veilleux, PhD, Professor of Computer, Data & Mathematical Sciences, Simmons University
Stefanie Shattuck-Hufnagel , PhD, Principal Research Scientist, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
© Speech Prosody 2026