UH Biocomputation Group - Speech recognitionhttp://biocomputation.herts.ac.uk/2020-12-16T12:28:04+00:00Speech Emotion recognition using deep neural networks using Mel Frequency Cepstral coefficients2020-12-16T12:28:04+00:002020-12-16T12:28:04+00:00Shreyah Iyertag:biocomputation.herts.ac.uk,2020-12-16:/2020/12/16/speech-emotion-recognition-using-deep-neural-networks-using-mel-frequency-cepstral-coefficients.html<p class="first last">Shreyah Iyer's Journal Club session where she will talk about a project"Speech Emotion recognition using deep neural networks using Mel Frequency Cepstral coefficients".</p>
<p>This week on Journal Club session Shreyah Iyer will talk about her project "Speech Emotion recognition using deep neural networks using Mel Frequency Cepstral coefficients".</p>
<hr class="docutils" />
<p>Speech is a very important context in understanding human emotions for example
in psychology and criminology as the effects of emotions in voice can be
recognized by all people irrespective of the language of speech. In this
presentation I will talk about an ongoing KTP Project on Speech Emotion
Recognition system.</p>
<p>The aim of the project is to build a system which can interpret the underlying
emotion from an audio/speech signal. So far, I have worked on using Deep
Learning architectures, i.e CNN’s with most widely used features for emotion
detection such as MFCC’s and Mel-spectrograms. I have especially investigated
what the best way is to use the coefficients extracted from MFCC’s. I have
worked on are two publicly available Speech Emotion Corpus i.e., TESS and
RAVDESS. Results conducted on these experiments show that the MFCC’s features
with an optimal stack length supersedes the other CNN architectures used.</p>
<p>In this presentation I will also talk about the challenges and future work for
this project.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p><strong>Date:</strong> 16/12/2020 <br />
<strong>Time:</strong> 16:00 <br />
<strong>Location</strong>: online</p>
Towards Discriminative representation learning for speech Emotion recognition2020-01-29T11:03:39+00:002020-01-29T11:03:39+00:00Emil Dmitruktag:biocomputation.herts.ac.uk,2020-01-29:/2020/01/29/towards-discriminative-representation-learning-for-speech-emotion-recognition.html<p class="first last">Yi Sun's journal club session where he will talk about the paper "Towards Discriminative representation learning for speech Emotion recognition".</p>
<p>This week on Journal Club session Yi Sun will talk about the paper "Towards Discriminative representation learning for speech Emotion recognition".</p>
<hr class="docutils" />
<p>In intelligent speech interaction, automatic speech emotion recognition (SER)
plays an important role in understanding user intention. While sentimental
speech has different speaker characteristics but similar acoustic attributes,
one vital challenge in SER is how to learn robust and discriminative
representations for emotion inferring. In this paper, inspired by human
emotion perception, we propose a novel representation learning component (RLC)
for SER system, which is constructed with Multi-head Self-attention and Global
Context-aware Attention Long Short-Term Memory Recurrent Neutral Network
(GCA-LSTM). With the ability of Multi-head Self-attention mechanism in
modeling the element-wise correlative dependencies, RLC can exploit the
common patterns of sentimental speech features to enhance emotion-salient
information importing in representation learning. By employing GCA-LSTM,
RLC can selectively focus on emotion-salient factors with the consideration
of entire utterance context, and gradually produce discriminative representation
for emotion inferring. Experiments on public emotional benchmark database
IEMOCAP and a tremendous realistic interaction database demonstrate the
outperformance of the proposed SER framework, with 6.6% to 26.7% relative
improvement on unweighted accuracy compared to state-of-the-art techniques.</p>
<p>Papers:</p>
<ul class="simple">
<li>Runnan Li et al. (2019) <a class="reference external" href="https://www.ijcai.org/Proceedings/2019/703">"Towards Discriminative Representation Learning for Speech Emotion Recognition"</a> ,
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligenced, Main track, Pages 5060-5066</li>
</ul>
<div class="line-block">
<div class="line"><br /></div>
</div>
<p><strong>Date:</strong> 31/01/2020 <br />
<strong>Time:</strong> 16:00 <br />
<strong>Location</strong>: B200</p>
Human Speech Spectrum, Frequency Range, Formants2017-11-09T23:26:35+00:002017-11-09T23:26:35+00:00Anuradha Sulanetag:biocomputation.herts.ac.uk,2017-11-09:/2017/11/09/human-speech-spectrum-frequency-range-formants.html<p class="first last">Anuradha's talk on the Human Speech Spectrum, Frequency Range and Formants.</p>
<p>This talk is based on the article "Human Speech Spectrum, Frequency Range, Formants" by Bernd Noack; Research Director CNRS (Centre Nationnal de la Recherche Scientifique), LIMSI-CNRS, France.</p>
<p>In this talk Anuradha will summarise the article and attempt to explain some fundamental features of human speech.
In particular, she will show the range of frequencies typically used by men and women.
She will also explain the role of formants (concentration of energy at particular frequency) in speech communication.</p>
<p>Vowels play an important part in the formation of words and how they are vocalised.
In fact, as will be shown much of the energy in human speech is used in the production of vowels.</p>
<p>Anuradha will also demonstrate how the human voice can be heard against the background of a full orchestra, even though the total sound from the orchestra is much louder than the singer (typically a soprano).</p>
<p>Finally, she will show the relationship between distance and received speech level.</p>
<p><strong>Date:</strong> 10/11/2017 <br />
<strong>Time:</strong> 16:00 <br />
<strong>Location</strong>: LB252</p>