Skip to main content

Conversational Emotion Recognition: Joint Speaker and Emotion Diarization in Conversations

Presenter:
Olorundamilola
Kazeem
Profile Link:
University:
Johns Hopkins University
Program:
CSGF
Year:
2022

Conversational emotion recognition (CER) is a subfield of automatic speech emotion recognition
(ASR), and is a highly active area of research towards endowing machines the ability to comprehend
and communicate with emotion. This area of research has extensive affective computing applications
across various sectors and industries (i.e. from cybersecurity to healthcare; and further onto computa-
tional storytelling for education and entertainment. For all these applications, it is important not just
to understand the speech content channel (i.e. “what is being said”), but also the emotional context
channel (i.e. “how it is being said”). This research aims to develop novel transformer-based neural
network models to determine and diarize “what was felt when” for a given speaker and “who felt what
and when” amongst two or more speakers in spontaneous conversational speech scenarios.