Введение

discourse

Дискурс

Discourse

2412-85622658-7777

СПбГЭТУ «ЛЭТИ»

10.32603/2412-8562-2019-5-5-136-152

discourse-289

Research Article

ЯЗЫКОЗНАНИЕ

LINGUISTICS

Распознавание эмоций по речи: человек против компьютера

Speech Emotion Recognition: Humans vs Machines

https://orcid.org/0000-0001-5176-8114

Вернер

Ш.

Werner

Штефан Вернер – доктор филологических наук (2000), профессор университета Восточной Финляндии

FI-80100 Йоэнсуу

Stefan Werner – PhD (Linguistics) (2000), Professor, University of Eastern Finland

FI-80100 Joensuu, Finland; FI-70210 Kuopio

https://orcid.org/0000-0003-3616-427X

Петренко

Г. К.

Petrenko

G. N.

Петренко Георгий Кириллович – ассистент кафедры иностранных языков

ул. Профессора Попова, д. 5, Санкт-Петербург, 197376

Georgii N. Petrenko – Assistant Lecturer at the Department of Foreign Languages

5 Professora Popova str., St Petersburg 197376

komrad-georgy2010@yandex.ru

Университет Восточной ФинляндииUniversity of Eastern Finland

Санкт-Петербургский государственный электротехнический университет «ЛЭТИ» им. В. И. Ульянова (Ленина)Saint Petersburg Electrotechnical University

2019

18122019

55136152

2019

Вернер Ш., Петренко Г.К.

Werner S., Petrenko G.N.

Данная работа распространяется под лицензией Creative Commons Attribution 4.0.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://discourse.elpub.ru/jour/article/view/289

Введение

Введение. В исследовании рассмотрены восприятие эмоций в речи и распознавание эмоций по речи на основании одних только интонационных свойств. Обсуждаются теоретические проблемы определения просодии, интонации и эмоции, а также классификации эмоций. Приводится обзор акустических и перцептивных характеристик, обнаруживающихся в речи в различных эмоциональных состояниях. Также рассматриваются технические подходы к распознаванию эмоций по речи в свете последних экспериментов по автоматической классификации эмоциональной речи.

Методология и источники

Методология и источники. Нами выбрана распространенная классификация "большая шестерка", типичная для решения технических задач, и дополнена такими эмоциями, как отвращение и стыд. В условиях акустической лаборатории была создана база данных эмоциональной русской речи. Далее мы провели эксперимент по восприятию эмоциональной речи, используя экспериментальную среду ПО Praat.

Результаты и обсуждение

Результаты и обсуждение. Выявлены возможности кросс-культурного распознавания эмоций, так как участники эксперимента из финской и международной групп распознали около половины образцов правильно. Тем не менее, носители русского языка, судя по всему, безошибочно различают больший процент эмоций. Влияние знания иностранных языков, музыкального образования и пола участников на результаты эксперимента недостаточно ярко выражены. Нами проведен анализ наиболее часто путаемых эмоций, таких как стыд и печаль, удивление и страх, злоба и отвращение, а также случаев, когда эмоционально окрашенная речь принималась за нейтральную.

Заключение

Заключение. Данная работа может внести свой вклад в психологические исследования, проясняя некоторые вопросы классификации эмоций и гендерный аспект эмоциональности; лингвистику, предоставляя новые данные для просодических и сравнительных языковых исследований; языковые технологии, углубляя понимание возможных трудностей при построении систем распознования эмоций.

Introduction

Introduction. The study focuses on emotional speech perception and speech emotion recognition using prosodic clues alone. Theoretical problems of defining prosody, intonation and emotion along with the challenges of emotion classification are discussed. An overview of acoustic and perceptional correlates of emotions found in speech is provided. Technical approaches to speech emotion recognition are also considered in the light of the latest emotional speech automatic classification experiments.

Methodology and sources

Methodology and sources. The typical “big six” classification commonly used in technical applications is chosen and modified to include such emotions as disgust and shame. A database of emotional speech in Russian is created under sound laboratory conditions. A perception experiment is run using Praat software’s experimental environment.

Results and discussion

Results and discussion. Cross-cultural emotion recognition possibilities are revealed, as the Finnish and international participants recognised about a half of samples correctly. Nonetheless, native speakers of Russian appear to distinguish a larger proportion of emotions correctly. The effects of foreign languages knowledge, musical training and gender on the performance in the experiment were insufficiently prominent. The most commonly confused pairs of emotions, such as shame and sadness, surprise and fear, anger and disgust as well as confusions with neutral emotion were also given due attention.

Conclusion

Conclusion. The work can contribute to psychological studies, clarifying emotion classification and gender aspect of emotionality, linguistic research, providing new evidence for prosodic and comparative language studies, and language technology, deepening the understanding of possible challenges for SER systems.

эмоциональная речьвосприятие эмоций в речираспознавание эмоций по речибаза данных эмоциональной русской речикорпусы эмоциональной речиклассификация эмоций

emotional speechspeech emotion perceptionspeech emotion recognitionRussian emotional speech databaseemotional speech corporaemotion classification

References1

Российская социологическая энциклопедия / под ред. Г. В. Осипова. М .: НОРМА-ИНФРА-М, 1999. URL: http://sociologicheskaya.academic.ru/1401/ (дата обращения: 03.11.2015).

Osipov, G.V. (ed.) (1999), Rossiiskaya sotsiologicheskaya entsiklopediya [Russian Sociological Encyclopedia], NORMA-INFRA-M, Moscow, available at: http://sociologicheskaya.academic.ru/1401/ (accessed 03.11.2015).

Ильин Е. П. Эмоции и чувства. 2-е изд., перераб. и доп. СПб.: Питер, 2013.

Ilyin, E.P. (2013), Emotions and Feelings. 2nd ed., Piter, SPb, Russia.

Seppnen, T., Toivanen, J. and Vyrynen E. Mediateam speech corpus: a first large finnish emotional speech database // Proceed. of XV International Conf. of Phonetic Science, vol. 3, Barcelona, Spain, 3–9 aug. 2003, pp. 2469–2472.

Seppnen, T., Toivanen, J. and Vyrynen E. (2003), “Mediateam speech corpus: a first large finnish emotional speech database”, Proceed. of XV International Conf. of Phonetic Science, vol. 3, Barcelona, Spain, 3–9 aug. 2003, pp. 2469–2472.

El Ayadi M., Kamel M. S., Karray F. Survey on speech emotion recognition: Features, classification schemes, and databases // Pattern Recognition. 2011. Vol. 44. Iss. 3. P. 572–587. DOI: https://doi.org/10.1016/j.patcog.2010.09.020.

El Ayadi, M., Kamel, M.S. and Karray, F. (2011), “Survey on speech emotion recognition: Features, classification schemes, and databases”, Pattern Recognition, vol. 44, iss. 3, pp. 572–587. DOI: https://doi.org/10.1016/j.patcog.2010.09.020.

Галунов В. И. О возможности определения эмоционального состояния говорящего по речи // Речевые технологии. 2008. № 1. С. 60–66.

Galunov, V.I. (2008), “On the possibility of speaker’s emotional state recognition from speech”, Speech Technology, vol. 1, pp. 60–66.

Брызгунова Е. А. Интонация // Русская грамматика / гл. ред. Н. Ю. Шведова. М.: Наука, 1980. Т. 1. С. 96–122.

Bryzgunova, E.A. (1980), “Intonation”, in Shvedova, N.Yu. (ed.) Russian Grammar, vol. 1, Nauka, Moscow, USSR, pp. 96–122.

Darwin C. The Expression of the Emotions in Man and Animals. NY: D. Appleton & Company, 1897.

Darwin, C. (1897), The Expression of the Emotions in Man and Animals. D. Appleton & Company, NY, USA.

Ostwald P. F. Acoustic Manifestations of Emotional Disturbance // Disorders of Communication. 1964. XLII. P. 450–465.

Ostwald, P.F. (1964), “Acoustic Manifestations of Emotional Disturbance”, Disorders of Communication, XLII, pp. 450–465.

Williams C. E., Stevens K. N. Emotions and speech: Some acoustical correlates // The Journal of the Acoustical Society of America. 1972. Vol. 52. № 4. P. 1238–1250.

Williams, C.E. and Stevens, K.N. (1972), “Emotions and speech: Some acoustical correlates”, The Journal of the Acoustical Society of America, vol. 52, no. 4, pp. 1238–1250.

Boersma P. Praat, a system for doing phonetics by computer // Glot International. 2002. Vol. 5. Iss. 9/10. P. 341–345.

Boersma, P. (2002) “Praat, a system for doing phonetics by computer”, Glot International, vol. 5, iss. 9/10, pp. 341–345.

Nash R. Intonational Interference in the Speech of Puerto Rican Bilinguals, an Instrumental Study Based on Oral Readings of a Juan Bobo Story. San Juan: Inter American Univ., 1968.

Nash, R. (1968), Intonational Interference in the Speech of Puerto Rican Bilinguals, an Instrumental Study Based on Oral Readings of a Juan Bobo Story, Inter American Univ., San Juan, PR.

Светозарова Н. Д. Интонационная система русского языка. Л.: Изд-во ЛГУ, 1982.

Svetozarova, N.D. (1982), Intonatsionnaya sistema russkogo yazyka [The Intonation System of the Russian Language], Leningrad Univ. Publishing House, Leningrad, USSR.

DiCanio C., Hatcher R. On the non-universality of intonation: Evidence from Triqui // The Journal of the Acoustical Society of America. 2018. Vol. 144. Iss. 3, DOI: https://doi.org/10.1121/1.5068494 (дата обращения: 15.09.2019).

DiCanio C. and Hatcher, R. (2018), “On the non-universality of intonation: Evidence from Triqui”, The Journal of the Acoustical Society of America, vol. 144, iss. 3, DOI: https://doi.org/10.1121/ 1.5068494 (accessed 15.09.2019).

Петренко Г. К., Шумков А. А. Речь и музыка: точки соприкосновения. СПб.: Изд-во СПбГЭТУ «ЛЭТИ», 2014.

Petrenko, G.K. and Shumkov, A.A. (2014), Speech and Music: Points of Contact, ETU Publishing House, SPb, Russia.

Автоматическое распознавание эмоций по речи с использованием метода опорных векторов и критерия джина / М. В. Хитров, А. Г. Давыдов, А. В. и др. // Речевые технологии. 2012. № 4. С. 34–43.

Khitrov, M.V., Davydov, A.G., Tkachenya, A.V., Kiselev, V.V. and Romashkin, Yu.N. (2012), “Automatic Speech Emotion Recognition Using the Support Vector Method and Gini Coefficient”, Speech Technology, vol. 4, pp. 34–43.

Манеров В. Х. Экспериментально-теоретические основы социальной идентификации и интерпретации говорящего: автореф. дис. ... д-ра психол. наук / РГПУ. СПб.,1993.

Manerov, V.H. (1993), “Experimental and Theoretical Foundations of Social Identification of Speaker Interpretation”, Abstract of Dr. Sci. (psychology) dissertation, The Herzen State Pedagogical Univ. of Russia, SPb, Russia.

Леонтьев А Н. Потребности, мотивы и эмоции. М.: МГУ, 1971.

Leont'ev, A.N. (1971), Needs, Motives and Emotions, Moscow State Univ., Moscow, USSR.

Восприятие речи. Вопросы функциональной асимметрии мозга / И. А. Вартанян, В. И. Галунов, Е. С. Дмитриева и др. Л.: Наука, 1988.

Vartanyan, I.A., Galunov, V.I., Dmitrieva, E.S., Zaitseva, K.A., Koroleva, I.V., Kuzmin, Yu.I., Morozov, V.P. and Shurgaya, G.G. (1988), Vospriyatie rechi. Voprosy funktsional'noi asimmetrii mozga [Speech perception. Functional brain asymmetry issues], Nauka, Leningrad, USSR.

Вартанов А. В., Косарева Ю. И. Эмоции человека и обезьян: субъективное шкалирование вокализаций // Вестн. Моск. ун-та. Сер. 14. Психология. 2015. № 2. С. 93–109. DOI: 10.11621/vsp.2015.02.93.

Vartanov, A.V., Kosareva, Yu.I. (2015), “Emotions of a person and a monkey: subjective scaling of vocalizations”, Moscow University Psychology Bulletin, vol. 2, pp. 93–109. DOI: 10.11621/vsp.2015.02.93.

Розалиев В. Л. Построение модели эмоций по речи человека // Изв. ВолгГТУ. 2007. Вып. 3. № 9 (35). C. 65–68.

Rozaliev, V.L. (2007), “Construction the model of emotions on speech of the person”, Izvestiya VolgGTU, iss. 3, no. 9 (35), pp. 65-68.

Ververidis D., Kotropoulos C. Emotional Speech Recognition: Resources, Features, and Methods // Speech Communication. Vol. 48. Iss. 9. P. 1162–1181. DOI: 10.1016/j.specom.2006.04.003.

Ververidis, D. and Kotropoulos, C. (2006), “Emotional Speech Recognition: Resources, Features, and Methods”, Speech Communication, vol. 48, iss. 9, pp. 1162–1181. DOI: 10.1016/j.specom.2006.04.003.

Fayek H. M., Lech M., Cavedon L. Evaluating deep learning architectures for Speech Emotion Recognition // Neural Networks. 2017. Vol. 92. P. 60–68. DOI: 10.1016/j.neunet.2017.02.013.

Fayek, H.M., Lech, M. and Cavedon, L. (2017), “Evaluating deep learning architectures for Speech Emotion Recognition”, Neural Networks, vol. 92, pp. 60–68. DOI: 10.1016/j.neunet.2017.02.013.

Сидоров К. В., Филатова Н. Н. Анализ признаков эмоционально окрашенной речи // Вестн. ТвГТУ. 2012. № 20. С. 26–31.

Sidorov, K.V. and Filatova, N.N. (2012), “Analysis of Signs of Emotive Speech”, Vestnik TvGTU, no. 20, pp. 26–31.

Features extraction and selection for emotional speech classification / Z. Xiao, E. Dellandrea, W. Dou et al. // IEEE Conference on Advanced Video and Signal Based Surveillance, Como, Italy, 5–16 Sept. 2005. P. 411–416. DOI: 10.1109/AVSS.2005.1577304.

Xiao, Z., Dellandrea, E., Dou, W. and Chen, L. (2005), “Features extraction and selection for emotional speech classification”, IEEE Conference on Advanced Video and Signal Based Surveillance, Como, Italy, 5–16 Sept. 2005, pp. 411–416. DOI: 10.1109/AVSS.2005.1577304.

Fewzee P., Karray F. Dimensionality Reduction for Emotional Speech Recognition // International Conference on Privacy, Security, Risk and Trust (PASSAT), International Conference on SocialCom, IEEE, Sept. 03–05, 2012. Amsterdam, Netherlands. P. 532–537. DOI: 10.1109/SocialCom-PASSAT.2012.83.

Fewzee, P. and Karray, F. (2012), “Dimensionality Reduction for Emotional Speech Recognition”, International Conference on Privacy, Security, Risk and Trust (PASSAT), International Conference on SocialCom, IEEE, 03–05 Sept., 2012, Amsterdam, Netherlands. pp. 532–537. DOI: 10.1109/SocialCom-PASSAT.2012.83.

Брестер К. Ю., Семенкин Е. С., Сидоров М. Ю. Система автоматического извлечения информативных признаков для распознавания эмоций человека в речевой коммуникации // Программные продукты и системы. 2014. № 4 (108). URL: http://cyberleninka.ru/article/n/sistema-avtomaticheskogo-izvlecheniya-informativnyh-priznakov-dlya-raspoznavaniya-emotsiy-cheloveka-vrechevoy-kommunikatsii (дата обращения: 15.07.2019).

Brester, K.Yu., Semenkin, E.S. and Sidorov, M.Yu. (2014), “Automatic Feature Selection System for Human Emotion Recognition in Speech Communication”, Software and Systems, no. 4 (108), available at: http://cyberleninka.ru/article/n/sistema-avtomaticheskogo-izvlecheniya-informativnyhpriznakov-dlya-raspoznavaniya-emotsiy-cheloveka-v-rechevoy-kommunikatsii (accessed 15.07.2019).

Eyben F., Wöllmer M., Schuller B. OpenSMILE – The Munich Versatile and Fast Open-Source Audio Feature Extractor // Proceedings of the 18th ACM international conference on Multimedia, oct. 25–29, 2010. Firenze, Italy. P. 1459–1462. DOI: 10.1145/1873951.1874246.

Eyben, F., Wöllmer, M. and Schuller, B. (2010) “OpenSMILE – The Munich Versatile and Fast Open-Source Audio Feature Extractor”, Proceedings of the 18th ACM international conference on Multimedia, oct. 25–29, 2010, Firenze, Italy, pp. 1459–1462. DOI: 10.1145/1873951.1874246.

Emotional Prosody Speech and Transcripts LDC2002S28 / M. Liberman, K. Davis, M. Grossman end al. Web Download. Philadelphia: Linguistic Data Consortium. 2002.

Liberman, M., Davis, K., Grossman, M., Martey, N.and Bell, J. (2002), Emotional Prosody Speech and Transcripts LDC2002S28. Web Download. Philadelphia: Linguistic Data Consortium.

USC-SFI MALACH Interviews and Transcripts English LDC2012S05 / Ramabhadran B., Gustman S., Byrne W. et al. (2012). Philadelphia: Linguistic Data Consortium. DVD.

Ramabhadran, B., Gustman, S. Byrne, W., Hajič J., Oard D., J. Scott Olsson, Picheny M. and Psutka J. (2012), USC-SFI MALACH Interviews and Transcripts English LDC2012S05, DVD, Linguistic Data Consortium, Philadelphia, USA.

A Database of German Emotional Speech / F. Burkhardt, A. Paeschke, M. Rolfes end al. // 9th European Conference on Speech Communication and Technology, Lisboa, Sept. 4–8. 2005. P. 1–4.

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. and Weiss, B. (2005) “A Database of German Emotional Speech”, 9th European Conference on Speech Communication and Technology, Lisboa, Portugal, sept. 4–8, 2005, pp. 1–4.

Makarova V., Petrushin V., RUSLANA: a database of Russian emotional utterances, 7th International Conference on Spoken Language Processing, ICSLP2002 – INTERSPEECH 2002, URL: https://www.researchgate.net/publication/221491469_RUSLANA_a_database_of_Russian_emotional_ utterances/ (дата обращения: 23.06.2018).

Makarova, V. and Petrushin, V. (2002), “RUSLANA: a database of Russian emotional utterances”, 7th International Conference on Spoken Language Processing, ICSLP2002 – INTERSPEECH 2002, available at: https://www.researchgate.net/publication/221491469_RUSLANA_a_database_of_Russian_emotional_uttera nces/ (accessed 23.06.2018).

Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech? / E. Shriberg, R. Bates, A. Stolcke et al. language and speech. 1998. Vol. 41 (3–4). P. 443–492.

Shriberg, E., Bates, R., Stolcke, A., Taylor, P., Jurafsky D. et al. (1998), “Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech?”, Language and Speech, vol. 41 (3–4), pp. 443–492.

Coleman J. Introducing Speech and Language Processing. Cambridge: Cambridge Univ. Press, 2005.

Coleman, J. (2005), Introducing Speech and Language Processing, Cambridge Univ. Press. Cambridge, UK.

Dickinson M., Brew C., Meurers D. Language and Computers. Hoboken, NJ: John Wiley & Sons, 2012.

Dickinson, M., Brew, C. and Meurers, D. (2012), Language and Computers, John Wiley & Sons Hoboken, NJ, USA.

Durand J., Gut U., Kristoffersen G. The Oxford handbook of corpus phonology. Oxford: Oxford Univ. Press, 2014.

Durand, J., Gut, U. and Kristoffersen, G. (2014), The Oxford handbook of corpus phonology, Oxford Univ. Press, Oxford, UK.

Hirst D., Di Cristo A. (ed.), Intonation Systems: A Survey of Twenty Languages. Cambridge: Cambridge Univ. Press, 1998.

Hirst, D. and Di Cristo, A. (ed.) (1998), Intonation Systems: A Survey of Twenty Languages, Cambridge Univ. Press, Cambridge, UK.

Rueckert L. Gender Differences in Empathy / in D. J. Scapaletti (ed.) // Psychology of Empathy. NY.: Nova Science Publishers, 2011. P. 221–234.

Rueckert, L. (2011), “Gender Differences in Empathy”, in Scapaletti, D.J. (ed.) Psychology of Empathy, Nova Science Publishers, NY, USA, pp. 221–234.

Palmer H. E. English Intonation with Systematic Exercises. Cambridge: Heffer, 1924.

Palmer, H.E. (1924), English Intonation with Systematic Exercises, Heffer, Cambridge, UK.

The authors declare that there are no conflicts of interest present.