This week on Journal Club session Shreyah Iyer will talk about her project "Speech Emotion recognition using deep neural networks using Mel Frequency Cepstral coefficients".
Speech is a very important context in understanding human emotions for example in psychology and criminology as the effects of emotions in voice can be recognized by all people irrespective of the language of speech. In this presentation I will talk about an ongoing KTP Project on Speech Emotion Recognition system.
The aim of the project is to build a system which can interpret the underlying emotion from an audio/speech signal. So far, I have worked on using Deep Learning architectures, i.e CNN’s with most widely used features for emotion detection such as MFCC’s and Mel-spectrograms. I have especially investigated what the best way is to use the coefficients extracted from MFCC’s. I have worked on are two publicly available Speech Emotion Corpus i.e., TESS and RAVDESS. Results conducted on these experiments show that the MFCC’s features with an optimal stack length supersedes the other CNN architectures used.
In this presentation I will also talk about the challenges and future work for this project.