Title |
Diagnosis of Depression Based on Transfer Learning Model Using Audio data of Interview-type |
Authors |
조아현(A-Hyeon Jo) ; 곽근창(Keun-Chang Kwak) |
DOI |
https://doi.org/10.5370/KIEEP.2021.70.4.277 |
Keywords |
AI technology; Depression diagnosis; Transfer Learning; Interview-type audio data; two-dimensional images |
Abstract |
Depression can lead to serious mental and physical illness, so early detection is important. Currently, a system to help early detection of depression using AI technology is being developed in various ways. In particular, research on diagnosing depression through voices that can be easily encountered in daily life is being actively conducted. In this paper, we compare and analyze the depression diagnosis performance of transfer learning models using interview-type audio data. Data use the DAIC-WOZ Depression Database, which contains audio files in interview-type. As the transfer learning model, it uses VGGish and YAMNet built based on Convolutional Neural Network(CNN) among deep learning models that are widely being used for audio classification. The characteristics of speech data are extracted to black-and-white and color two-dimensional images using the Bark spectrogram, Mel spectrogram, and Log Mel-spectrogram methods. The performance of the depression diagnosis model is higher in YAMNet than in VGGish. In case that black-and-white images are input, YAMNet’s performance was the highest with 94.48% when mel spectrogram features were used.On the other hand, in case that color images are input, YAMNet’s performance was the highest at 97.34% when bark spectrogram features were used proving that it is most suitable for diagnosing depression |