Journal of Academic Research for Humanities (JARH) is a double-blind peer-review, Open Free Access, online Multidisciplinary Research Journal
Skip to main navigation menu Skip to main content Skip to site footer

Condensing Video Content: Deep Learning Advancements and Challenges in Video Summarization Innovations

Abstract

The exponential increase in video uploads on platforms like YouTube, exceeding 500 hours per minute, presents critical challenges in indexing, retrieval, and navigation. Existing methods heavily depend on user-generated metadata, often misaligned with the actual content. To address these challenges, we conducted a systematic review of video summarization techniques employing deep learning. An initial pool of 300 research articles was screened using strict quality criteria, resulting in a final selection of 44 studies. Articles were included if they focused on video summarization, employed deep learning approaches, utilized video datasets for evaluation, and were published in English or Urdu between 2019 and 2024 in peer-reviewed journals or conference proceedings. Papers were excluded if they lacked evaluations, used non-English/Urdu datasets, or were published before 2019. This review synthesizes recent advancements, highlights practical applications, and discusses relevant datasets, offering valuable insights for researchers and practitioners seeking to enhance automated video indexing and retrieval on social networking platforms.

Keywords

YouTube, Evaluations, Systematic, Indexing, Platforms

PDF

References

  1. Ahmed, M., & Sh, M. (2021). Design, usage and impact of virtual university mobile LMS application on students learning of virtual university of Pakistan. International Journal of Advanced Trends in Computer Science and Engineering, 10(3), 1837-1843.
  2. Fu, T.-J., Tai, S.-H., & Chen, H.-T. (2019). Attentive and adversarial learning for video summarization. In 2019 winter conference on applications of computer vision (wacv) (pp. 1579–1587).
  3. Gao, J., Yang, X., Zhang, Y., & Xu, C. (2020). Unsupervised video summarization via relation-aware assignment learning. IEEE Transactions on Multimedia.
  4. Ghauri, J. A., Hakimov, S., & Ewerth, R. (2021). Supervised video summarization via multiple feature sets with parallel attention. In 2021 ieee an international conference on multimedia and expo (icme) (p. 1-6s). doi: 10.1109/ICME51207.2021.9428318
  5. Gunuganti, J., Yeh, Z.-T., Wang, J.-H., & Norouzi, M. (2024). Unsupervised video summarization with adversarial graph-based attention network. Journal of Visual Communication and Image Representation, 104200.
  6. Harakannanavar, S. S., Sameer, S. R., Kumar, V., Behera, S. K., Amberkar, A. V., & Puranikmath, V. I. (2022). Robust video summarization algorithm using supervised machine learning. Global Transitions Proceedings, 3(1), 131–135.
  7. He, X., Hua, Y., Song, T., Zhang, Z., Xue, Z., Ma, R., Guan, H. (2019). Unsupervised video summarization with attentive conditional generative adversarial networks. In Proceedings of the 27th ACM International Conference on Multimedia (pp. 2296–2304).
  8. Hsu, T.-C., Liao, Y.-S., & Huang, C.-R. (2023). Video summarization with spatiotemporal vision transformer. IEEE Transactions on Image Processing, 32, 3013-3026. doi: 10.1109/TIP.2023.3275069
  9. Hu, M., Hu, R., Wang, Z., Xiong, Z., & Zhong, R. (2022). Spatiotemporal two-stream lstm network for unsupervised video summarization. Multimedia Tools and Applications, 81(28), 40489–40510.
  10. Huang, C., & Wang, H. (2019). A novel key-frames selection framework for comprehensive video summarization. IEEE Transactions on Circuits and Systems for Video Technology, 30(2), 577–589.
  11. Ji, Z., Jiao, F., Pang, Y., & Shao, L. (2020). Deep attentive and semantic preserving video summarization. Neurocomputing, 405, 200–207.
  12. Khan, H., Hussain, T., Khan, S. U., Khan, Z. A., & Baik, S. W. (2024). Deep multi-scale pyramidal features network for supervised video summarization. Expert Systems with Applications, 237, 121288.
  13. Lal, S., Duggal, S., & Sreedevi, I. (2019). Online video summarization: Predicting the future to better summarize the present. In 2019 winter conference on applications of computer vision (wacv) (pp. 471–480).
  14. Li, Q., Chen, J., Xie, Q., & Han, X. (2023). Video summarization for event-centric videos. Neural Networks, 161, 359–370. Li, Z., & Yang, L. (2021). Weakly supervised deep 10 reinforcement learning for video summarization with semantically meaningful reward. In Proceedings of the ieee/cvf winter conference on applications of computer vision (pp. 3239–3247).
  15. Liu, A.-A., Shao, Z., Wong, Y., Li, J., Su, Y.-T., & Kankan halli, M. (2019). Lstm-based multi-label video event detection. Multimedia Tools and Applications, 78, 677–695.
  16. Liu, Y.-T., Li, Y.-J., Yang, F.-E., Chen, S.-F., & Wang, Y.-C. F. (2019). Learning hierarchical self-attention for video summarization. In 2019 ieee international conference on image processing (icip) (pp. 3377–3381).
  17. Mathews, R. P., Panicker, M. R., Hareendranathan, A. R., Chen, Y. T., Jaremko, J. L., Buchanan, B., . . . Mathews, G. (2023). Unsupervised multi-latent space rl framework for video summarization in ultrasound imaging. IEEE Journal of Biomedical and Health Informatics, 27(1), 227-238. doi: 10.1109/JBHI.2022.3208779
  18. Messaoud, S., Lourentzou, I., Boughoula, A., Zehni, M., Zhao, Z., Zhai, C., & Schwing, A. (2021). Deepqamvs: Query-aware hierarchical pointer networks for multi-video summarization. In (p. 1389-1399).
  19. Minhas, S., Hussain, T., Ghani, A., Sajid, K., & Pakistan, L. (2021). Exploring students online learning: A study of Zoom application. Gazi University Journal of Science, 34(2), 171-178.
  20. Milbich, T., Bautista, M., Sutter, E., & Ommer, B. (2017). Unsupervised video understanding by the reconciliation of posture similarities. In Proceedings of the ieee international conference on computer vision (pp. 4394–4404).
  21. Mujtaba, G., Shuib, L., Idris, N., Hoo, W. L., Raj, R. G., Khowaja, K., . . . Nweke, H. F. (2019). Clinical text classification research trends: Systematic literature review and open issues. Expert Systems with Applications, 116, 494-520. Retrieved from https://www.sciencedirect.com/science/article/pii/S0957417418306110
  22. doi: https://doi.org/10.1016/j.eswa.2018.09.034
  23. Negi, A., Kumar, K., & Saini, P. (2024). Object of interest and unsupervised learning-based framework for an effective video summarization using deep learning. IETE Journal of Research, 70(5), 5019–5030.
  24. Pang, Z., Nakashima, Y., Otani, M., & Nagahara, H. (2023). Contrastive losses are natural criteria for unsupervised video summarization. In Proceedings of the ieee/cvf winter conference on applications of computer vision(pp. 2010–2019).
  25. Rochan, M., & Wang, Y. (2019). Video summarization by learning from unpaired data. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 7902–7911).
  26. Shamsi, F., Nazeer, M. I., Memon, R. A., & Mangrio, M. I. (2017). Reflections of practical implementation of the academic course analysis and design of algorithms taught in the universities of Pakistan. Sukkur IBA Journal of Computing and Mathematical Sciences, 1(2), 31–38.
  27. Shamsi, F., Sher, M. D., & Shaikh, S. (2019). Content-based automatic video genre identification. International Journal of Advanced Computer Science and Applications, 10(6).
  28. Shamsi, F., & Sindhu, I. (2021). Improving DBLP efficiency through social media mining. Journal of Information & Communication Technology (JICT), 15(1).
  29. Sindhu, I., & Shamsi, F. (2023a). Adverse use of social media by higher secondary school students: A case study on meta social network platforms. International” Journal of Academic Research for Humanities”, 3(4), 205–216.
  30. Sindhu, I., & Shamsi, F. (2023b). Prediction of IMDb movie score & movie success by using Facebook. In 2023 international multi-disciplinary conference in emerging research trends (imcert) (Vol. 1, pp. 1–5).
  31. Wang, G., Wu, X., & Yan, J. (2024). Progressive reinforcement learning for video summarization. Information Sciences, 655, 119888.
  32. Wang, J., Wang, W., Wang, Z., Wang, L., Feng, D., & Tan, T. (2019). Stacked memory network for video summarization. In Proceedings of the 27th ACM International Conference on Multimedia (pp. 836–844).
  33. Wu, G., Song, S., Wang, X., & Zhang, J. (2024). Reconstructive network under contrastive graph rewards for video summarization. Expert Systems with Applications, 250, 123860.
  34. Xie, J., Chen, X., Zhao, S., & Lu, S.-P. (2024). Video summarization via knowledge-aware multimodal deep networks. Knowledge-Based Systems, 293, 111670.
  35. Yaliniz, G., & Ikizler-Cinbis, N. (2021). Using independently recurrent networks for reinforcement learning based unsupervised video summarization. Multimedia Tools and Applications, 80(12), 17827–17847.
  36. Yu, Q., Yu, H., Sun, Y., Ding, D., & Jian, M. (2024). Unsupervised video summarization based on the diffusion model of feature fusion. IEEE Transactions on Computational Social Systems.
  37. Yuan, L., Tay, F. E. H., Li, P., & Feng, J. (2019). Unsupervised video summarization with cycle-consistent adversarial LSTM networks. IEEE Transactions on Multimedia, 22(10), 2711–2722.
  38. Yuan, Y., & Zhang, J. (2022). Unsupervised video summarization via deep reinforcement learning with shot-level semantics. IEEE Transactions on Circuits and Systems for Video Technology, 33(1), 445–456.
  39. Zang, S.-S., Yu, H., Song, Y., & Zeng, R. (2023). Unsupervised video summarization using deep non-local video summarization networks. Neurocomputing, 519, 26–35.
  40. Zhang, W., Wang, B., Ma, L., & Liu, W. (2019). Reconstruct and represent video content for captioning via reinforcement learning. IEEE transactions on pattern analysis and machine intelligence, 42(12), 3088–3101.
  41. Zhang, Y., Kampffmeyer, M., Zhao, X., & Tan, M. (2019). Dtrgan: Dilated temporal relational adversarial network for video summarization. In Proceedings of the ACM Turing celebration conference-china (pp. 1–6).
  42. Zhang, Y., Liu, Y., Zhu, P., & Kang, W. (2022). Joint reinforcement and contrastive learning for unsupervised video summarization. IEEE Signal Processing Letters, 29, 2587–2591.
  43. Zhao, B., Li, X., & Lu, X. (2019). Property-constrained dual learning for video summarization. IEEE transactions on neural networks and learning systems, 31(10), 3989–4000. 11
  44. Zhong, S.-H., Lin, J., Lu, J., Fares, A., & Ren, T. (2022). Deep semantic and attentive network for unsupervised video summarization. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 18(2), 1–21.
  45. Zhu, Y., Zhao, W., Hua, R., & Wu, X. (2023). Topic-aware video summarization using a multimodal transformer. Pattern Recognition, 140, 109578.