Evaluating RNN Variants for Dysphonia Classification Using the Uncommon Voice Dataset: A Comparative Analysis
Abstract
Dysphonia, a voice disorder caused by vocal cord dysfunction, significantly affects individuals' communication abilities. Early and accurate detection of dysphonia is crucial for timely intervention and effective treatment. Leveraging advancements in deep learning, this study employs RNN - Recurrent Neural Networks to enhance detection accuracy. However, most studies in this domain rely on common datasets, leading to limited generalizability of models to diverse populations. In this study, we explore the use of various RNN variants, including traditional RNN - Recurrent Neural Network, LSTM - Long Short-Term Memory and GRU - Gated Recurrent Neural Network, to detect dysphonia in an uncommon voice dataset. Existing works focus primarily on conventional datasets and simpler classifiers, leaving room for improvement in accuracy and robustness. Our methodology leverages feature extraction techniques to preprocess the dataset, followed by training RNN variants to evaluate their performance in classifying dysphonic and non-dysphonic voices. Each RNN variant was trained and evaluated on the preprocessed dataset, divided into an 80:20 ratio for training and testing. The results revealed differences in model performance, with the standard RNN achieving an accuracy of 76%, while the LSTM and GRU models outperformed it, achieving accuracy of 94% and 93%, respectively. The experimental results demonstrate the effectiveness of advanced RNN models in handling diverse and challenging datasets, offering insights into their comparative performance for dysphonia detection and advancing research in voice disorder diagnosis.Keywords
Dysphonia, Disorder, RNN, LSTM, GRU
References
- Abdul, Z. K., & Al-Talabani, A. K. (2022). Mel frequency cepstral coefficient and its applications: A review. IEEE Access, 10, 122136-122158.
- A. Graves, Long Short-Term Memory Networks with Residual Connections for Speech Recognition. 2014. Shamsi F, Sher MD, Shaikh S. Content-based automatic video genre identification. International Journal of Advanced Computer Science and Applications. 2019.
- Dávid Sztahó, K. G., & Gábriel, T. M. (2021). Deep learning solution for pathological voice detection using LSTM-based autoencoder hybrid with multi-task learning. In I14th International Joint Conference on Biomedical Engineering Systems and Technologies (pp. 135-141).
- da Silva Moura, R., Maia, J. M., & Dajer, M. E. (2022, October). Detection and Classification of Categories of Dysphonia Using Convolutional Neural Network. In Latin American Conference on Biomedical Engineering (pp. 599-610). Cham: Springer Nature Switzerland.
- Hadjaidji, E., Korba, M. C. A., & Khelil, K. (2021, September). Spasmodic dysphonia detection using machine learning classifiers. In 2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI) (pp. 1-5). IEEE.
- Ksibi, A., Hakami, N. A., Alturki, N., Asiri, M. M., Zakariah, M., & Ayadi, M. (2023). Voice pathology detection using a two-level classifier based on combined cnn–rnn architecture. Sustainability, 15(4), 3204.
- Moore, M., Papreja, P., Saxon, M., Berisha, V., & Panchanathan, S. (2020, October). Uncommonvoice: A Crowdsourced dataset of dysphonic speech. In Interspeech (pp. 2532-2536).
- Merzougui, N., Korba, M. C. A., & Amara, F. (2024, January). Diagnosing Spasmodic Dysphonia with the Power of AI. In 2024 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS) (pp. 1042-1046). IEEE.
- RIVERA, M. A. B., GARCÍA, C. A. R., ROJAS, T. C. T., FLORES, P. M. Q., & LOAIZA, R. E. P. (2023). AUTOMATIC IDENTIFICATION OF DYSPHONIAS USING MACHINE LEARNING ALGORITHMS. Applied Computer Science, 19(4), 14-25.
- Syed, Sidra Abid, et al. "Comparative analysis of CNN and RNN for voice pathology detection." BioMed Research International 2021.1 (2021): 6635964.
- Schmidhuber, “Neural Networks for Compressing Sequences and Applications in Text and Speech Recognition,” in Field Guide to Neurocomputing, Amsterdam, Netherlands: Elsevier, 2000, pp. 917–921.
- Sindhu, I., & Sainin, M. S. (2024). Automatic Speech and Voice Disorder Detection using Deep Learning Systematic Literature Review. IEEE Access.
- Shih, D. H., Liao, C. H., Wu, T. W., Xu, X. Y., & Shih, M. H. (2022, October). Dysarthria speech detection using convolutional neural networks with gated recurrent unit. In Healthcare (Vol. 10, No. 10, p. 1956). MDPI.
- Shewalkar, A., Nyavanandi, D., & Ludwig, S. A. (2019). Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU. Journal of Artificial Intelligence and Soft Computing Research, 9(4), 235-245.
- Shewalkar, A. N. (2018). Comparison of rnn, lstm and gru on speech recognition data.
- Shamsi, F., & Sindhu, I. (2024). Condensing Video Content: Deep Learning Advancements and Challenges in Video Summarization Innovations. International" Journal of Academic Research for Humanities", 4(4), 113-124.
- Shamsi, F., Sher, M. D., & Shaikh, S. (2019). Content-based automatic video genre identification. International Journal of Advanced Computer Science and Applications, 10(6).
- S. Khandelwal, B. Lecouteux, and L. Besacier, Comparing GRU and LSTM for automatic speech recognition. Diss. LIG, 2016.