Speech samples are quick, easy, and inexpensive to collect, and speech-eliciting tasks can be administered remotely and frequently without much practice effects. Thus, speech holds significant potential for non-invasive and cost-effective screening and monitoring of various clinical populations, including but not limited to autism, ADHD, depression, anxiety, schizophrenia, Alzheimer’s disease, mild cognitive impairment, and related dementias. Speech prosody encodes complex linguistic and paralinguistic information into measurable and quantifiable signals, providing a critical opportunity to examine cognitive, social, executive, and motor functions. This special session brings together researchers working on speech prosody in clinical contexts and aims to foster discussion on its roles and impact as a potential “biomarker” of specific conditions.
In this special session, we invite contributions that examine the feasibility, stability, and generalizability of speech prosody as biomarkers in clinical conditions, methodological innovations in measuring speech prosody in clinical populations, and the utility and practicality of using speech prosody as biomarkers. We encourage submissions that address these issues using a variety of approaches, ranging from traditional manual assessments of prosody to large acoustic models (e.g., Whisper, Wav2vec2.0) and multimodal models (e.g., Qwen2.5).
Topics:
Analysis of speech prosody patterns in various clinical populations
Perception of speech prosody in clinical populations, in particular those with language and speech impairments
Novel methods in assessing speech prosody in clinical populations
Development of screening or monitoring models using speech prosody
Integration of speech prosody with other digital biomarkers for multimodal processing
Association between speech prosody and other behavioral, fluid, neuroimaging biomarkers
Organizers:
Sunghye Cho (Research Assistant Professor, Linguistics | Linguistic Data Consortium, University of Pennsylvania)
Min Seok Baek (Assistant Professor, Neurology, Yonsei University Wonju College of Medicine)
The ability to comprehend and produce speech prosody is key for successful communication and interpersonal relationships, which in turn are well-known determinants of quality of life and health. Injury to the brain, e.g., from a stroke or traumatic brain injury, can selectively impair different aspects of this crucial ability, depending on which parts of the brain are affected. A better understanding of the neural basis of different aspects of speech prosody, in addition to being of academic interest to those seeking to develop biologically plausible models, can provide clues regarding more precise prognostics and targeted rehabilitation approaches.
However, even for seemingly straightforward questions—such as whether prosody is predominantly supported by the brain’s right hemisphere, whether linguistic and emotional
aspects of prosody are supported by overlapping or distinct mechanisms, or to what extent the neural substrate of prosody production is intertwined with that of prosody perception—findings from clinical and neuroscience studies are strikingly inconsistent. One reason for this
inconsistency is the difficulty of assessing acquired prosodic impairments. The large
interindividual and sociocultural variability in typical prosody and the complex ways in which
prosody interacts with carrier stimuli make it difficult to determine whether a patient’s performance is atypical and is attributable to a prosodic impairment. Thus, clinical assessments often use stimuli with limited psychometric quality and poor sociocultural relevance. Further variability arises from differences in how researchers operationalize and refer to different aspects of prosody. For instance, studies of linguistic prosody can have varying foci: one study may use phrase-level prosodic contour (statement vs. question) manipulations, whereas another study may investigate lexical stress, and yet another may use pitch contrasts that carry lexical meaning for speakers of tonal languages (but not for speakers unfamiliar with lexical tone).
This session seeks to bring together linguists, neuroscientists, and clinicians interested in prosody to bridge between the linguistic study of prosody, the study of its neural basis, and the study of its clinical impairments and treatment. To this end, we invite papers about prosody in clinical populations, papers that investigate the neural basis of specific aspects of prosody in a linguistically informed manner, papers that discuss neurobiologically informed models of prosody, papers about sociocultural-sensitive assessments of prosody, papers highlighting the clinical and social significance of prosody, and contributions from groups of researchers that exemplify collaborations between linguists, clinicians, and neuroscientists.
Organizers:
Anna Seydell-Greenwald, PhD, Georgetown University Medical Center, Washington DC, USA
Tamar I. Regev, PhD, MIT, Cambridge, MA, USA.
Alexandra Zezinka Durfee, PhD CCC-SLP, Towson University, Towson, MD, USA.
Guide to Annotation with PoLaR for Beginners
PoLaR (Points Levels and Ranges) is a recently developed prosodic annotation system, and this tutorial will be useful for a wide range of prosodic annotators, from experienced labellers who are new to PoLaR to novices in prosodic annotation (but anyone is welcome to join!). The primary goal of this tutorial is to provide participants with hands-on experience in PoLaR-labelling speech data from mainstream American English, which will enable them to PoLaR-label data from a variety of different languages, sources and contexts.
PoLaR (see http://polarlabels.com), has been developed to facilitate the labelling of individual prosodic characteristics and cues (thereby decomposing phonological labels such as H* or L-L% into their more atomic phonological and acoustic components). In addition to isolating particular prosodic qualities, PoLaR is intended as a system that can be applied to describe any register, dialect, or language, including those dialects and languages that are under-studied. In fact, PoLaR may be useful for capturing and analyzing (potentially systematic) variability, both within and across various categories in phonological, semantic/pragmatic, and/or sociolinguistic domains.
© Speech Prosody 2026