KIEE - The Transactions P of the Korean Institute of Electrical Engineers

Mobile QR Code QR CODE : The Transactions P of the Korean Institute of Electrical Engineers

QR CODE : The Transactions P of the Korean Institute of Electrical Engineers

The Transactions P of the Korean Institute of Electrical Engineers

Korean Journal of Air-Conditioning and Refrigeration Engineering

ISO Journal TitleTrans. P of KIEE

Indexed by
Korea Citation Index(KCI)

Main Menu

Journal Search

XML PDF INFO REF


Title	Diagnosis of Depression Based on Transfer Learning Model Using Audio data of Interview-type
Authors	조아현(A-Hyeon Jo) ; 곽근창(Keun-Chang Kwak)
DOI	https://doi.org/10.5370/KIEEP.2021.70.4.277
Page	pp.277-283
ISSN	1229-800X
Keywords	AI technology; Depression diagnosis; Transfer Learning; Interview-type audio data; two-dimensional images
Abstract	Depression can lead to serious mental and physical illness, so early detection is important. Currently, a system to help early detection of depression using AI technology is being developed in various ways. In particular, research on diagnosing depression through voices that can be easily encountered in daily life is being actively conducted. In this paper, we compare and analyze the depression diagnosis performance of transfer learning models using interview-type audio data. Data use the DAIC-WOZ Depression Database, which contains audio files in interview-type. As the transfer learning model, it uses VGGish and YAMNet built based on Convolutional Neural Network(CNN) among deep learning models that are widely being used for audio classification. The characteristics of speech data are extracted to black-and-white and color two-dimensional images using the Bark spectrogram, Mel spectrogram, and Log Mel-spectrogram methods. The performance of the depression diagnosis model is higher in YAMNet than in VGGish. In case that black-and-white images are input, YAMNet’s performance was the highest with 94.48% when mel spectrogram features were used.On the other hand, in case that color images are input, YAMNet’s performance was the highest at 97.34% when bark spectrogram features were used proving that it is most suitable for diagnosing depression

Copyright © KIEE All right's reserved

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution and reproduction in any medium, provided the original work is property cited.