MASC: A Speech Corpus in Mandarin for Emotion Analysis and Affective Speaker Recognition
Tian Wu, Yingchun Yang, Zhaohui Wu and Dongdong Li CCNT Lab, College of Computer Science and Technology Zhejiang University, Hangzhou, P.R.CHINA
{wutian, yyc, wzh, lidd}
In this paper, a large emotional speech database MASC (Mandarin Affective Speech Corpus) is introduced. The database contains recordings of 68 native speakers (23 female and 45 male) and ve kinds of emotional states: neutral, anger, elation, panic and sadness. Each speaker pronounces 5 phrases, 10 sentences for three times for each emotional states and 2 paragraphs only for neutral. These materials covers all the phonemes in Chinese. This corpus is constructed for prosodic and linguistic investigation of emotion expression in Mandarin. It can also be used for recognition of affectively stressed speakers. Furthermore, prosodic feature analysis and speaker recognition baseline experiment are performed on this database.
1. Introduction
Ways of expressing emotions by human and the effect on speech of emotional state changes to speakers have intrigued researchers for a long time. Currently, psychologists have done many experiments and raised a variety of theories [1]. However, collecting large scale affective speech corpus is a very difcult task. Few works are done here. Emotional Prosody Speech and Transcripts (EPST) is an emotional speech database provided by Linguistic Data Consortium (LDC) [2]. This corpus covers 14 emotional states based on Banse & Scherer's selection criteria [3] and is designed to support research in emotional prosody. For speaker-independent emotion recognition, Sony entertainment AIBO is a target scenario to which emotional databases are recorded. These databases simulate different possible situations and comprise all the desired emotions [4]. RUSLANA is a database of emotional utterances and recorded in Russian, aiming for linguistic and speech processing research on communicative and emotive-attitudinal aspects of spoken language [5]. Sixty-one native speakers of standard Russian were recorded for this database. As mentioned above, academic and applied research activities are stimulated in the area of emotion recognition and analysis. By far, there is still not a large speech database used for affectively speaker recognition. Our motivation of creating an emotional speech corpus arises from the mismatch in automatic speaker recognition. Current speaker verication and identication systems are limited by the effect on speech of transient state changes to speakers. The variability of intra-speaker can cause unacceptably high error rates [6]. Furthermore, in the emotional speech investigation area, the focus has so far been on some major languages as English, German, French and Russian. Very little is known about the vocal correlates of emotion in continuous spoken Mandarin. Our goal is to provide a large corpora in Chinese designed
- emotionrevealed > speaker-independent
