Conversational Emotion Recognition: Joint Speaker and Emotion Diarization in Conversations

Presenter:

Olorundamilola

Kazeem

Profile Link:

Olorundamilola Kazeem

University:

Johns Hopkins University

Program:

CSGF

Year:

2022

Conversational emotion recognition (CER) is a subfield of automatic speech emotion recognition
(ASR), and is a highly active area of research towards endowing machines the ability to comprehend
and communicate with emotion. This area of research has extensive affective computing applications
across various sectors and industries (i.e. from cybersecurity to healthcare; and further onto computa-
tional storytelling for education and entertainment. For all these applications, it is important not just
to understand the speech content channel (i.e. “what is being said”), but also the emotional context
channel (i.e. “how it is being said”). This research aims to develop novel transformer-based neural
network models to determine and diarize “what was felt when” for a given speaker and “who felt what
and when” amongst two or more speakers in spontaneous conversational speech scenarios.

Secure Login

Secure Login

Conversational Emotion Recognition: Joint Speaker and Emotion Diarization in Conversations