Book Details

VIDEO-BASED ACTION RECOGNITION USING SPATIOTEMPORAL DEEP LEARNING MODELS

International Journal of Computer Science (IJCS) Published by SK Research Group of Companies (SKRGC)

Download this PDF format

Abstract

Video-based action recognition is a crucial task in computer vision with applications spanning surveillance, sports analytics, human-computer interaction, and autonomous systems. This paper explores the application of spatiotemporal deep learning models for action recognition, leveraging advancements in neural network architectures to analyze video data effectively. Traditional approaches often process spatial and temporal dimensions independently, limiting their ability to capture complex motion patterns and contextual relationships. In contrast, spatiotemporal deep learning models integrate spatial and temporal features simultaneously, enabling robust recognition of dynamic actions. The study highlights key methods, including convolutional neural networks (CNNs) for spatial feature extraction and recurrent neural networks (RNNs) or 3D convolutional networks (3D-CNNs) for temporal modelling. It also delves into transformer-based architectures and attention mechanisms, which enhance model performance by selectively focusing on salient regions and time steps. A comprehensive evaluation is conducted using benchmark datasets such as UCF101, HMDB51, and Kinetics, comparing the efficacy of various architectures in terms of accuracy, computational efficiency, and scalability. Additionally, the impact of pretraining on large-scale datasets, multimodal fusion (e.g., combining visual and audio data), and data augmentation techniques are discussed to improve generalization. Challenges such as handling occlusions, recognizing subtle motions, and reducing computational overhead are addressed. The results demonstrate the potential of spatiotemporal deep learning models to outperform traditional methods, paving the way for more accurate and efficient video-based action recognition systems. Future directions include exploring unsupervised and semi-supervised learning approaches, real-time inference capabilities, and domain-specific adaptations for broader applications.

References

[1] Aggarwal, J. K. Human Action Analysis with Deep Learning: Methods and Applications. Springer, 2021.

[2] Poppe, Ronald. Computer Vision for Action Recognition: A Deep Learning Perspective. CRC Press, 2020.

[3] Simonyan, Karen, and Andrew Zisserman. Two-Stream Convolutional Networks for Action Recognition in Videos. Springer, 2018.

[4] Wang, Limin, et al. "Temporal Segment Networks: Towards Good Practices for Deep Action Recognition." European Conference on Computer Vision (ECCV), 2016, pp. 20-36, doi: 10.1007/978-3-319-46484-8_2.

[5] Feichtenhofer, Christoph, et al. "SlowFast Networks for Video Recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, 2022, pp. 3182-3197, doi:10.1109/TPAMI.2020.3046409.

[6] Carreira, Joao, and Andrew Zisserman. "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6299-6308, doi:10.1109/CVPR.2017.502.

[7] Girdhar, Rohit, et al. "Video Action Transformer Network." IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 244-253, doi:10.1109/CVPR.2019.00033.

[8] Ji, Shuiwang, et al. "3D Convolutional Neural Networks for Human Action Recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, 2013, pp. 221-231, doi:10.1109/TPAMI.2012.59.

[9] Lin, Jie, et al. "TSM: Temporal Shift Module for Efficient Video Understanding." Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV),2019, pp. 7082-7092, doi:10.1109/ICCV.2019.00718.

[10] Deep Learning for Video Action Recognition: A Survey. NVIDIA Research, 2022.

[11] Rajkumar, V., and V. Maniraj. "HYBRID TRAFFIC ALLOCATION USING APPLICATION-AWARE ALLOCATION OF RESOURCES IN CELLULAR NETWORKS." Shodhsamhita (ISSN: 2277-7067) 12.8 (2021).

[12] Ambika, G., and P. Srivaramangai. "REVIEW ON SECURITY IN THE INTERNET OF THINGS." International Journal of Advanced Research in Computer Science 9.1 (2018).

[13] Rosy, C. Premila, and R. Ponnusamy. "Evaluating and forecasting room demand in tourist spot using Holt-Winters method." International Journal of Computer Applications 975 (2017): 8887.

[14] D.Ragupathi, N.Jayaveeran, “The Design & Implementation of Transportation Procedure using Migration Techiques,” International Journal of Computer Sciences and Engineering, Vol.5, Issue.6, pp.273-278, 2017.

[15] Rajkumar, V., and V. Maniraj. "RL-ROUTING: A DEEP REINFORCEMENT LEARNING SDN ROUTING ALGORITHM." JOURNAL OF EDUCATION: RABINDRABHARATI UNIVERSITY (ISSN: 0972-7175) 24.12 (2021).

[16] Ambika, G., and P. Srivaramangai. "A study on data security in Internet of Things." Int. J. Comput. Trends Technol. 5.2 (2017): 464-469.

[17] C.Senthil Selvi, Dr. N. Vetrivelan, “Medical Search Engine Based On Enhanced Best First Search International Journal Of Research And Analytical Reviews (IJRAR.ORG) 2019, Volume 6, Issue 2, Page No: 248-250.

[18] Rajkumar, V., and V. Maniraj. "Software-Defined Networking's Study with Impact on Network Security." Design Engineering (ISSN: 0011-9342) 8 (2021).

[19] Ambika, G., and D. P. Srivaramangai. "A study on security in the Internet of Things." Int. J. Sci. Res. Comput. Sci. Eng. Inform. Technol 5.2 (2017): 12-21.

[20] K.U. Malar, D. Ragupathi, G.M. Prabhu, “The Hadoop Dispersed File system: Balancing Movability and Performance”, International Journal of Computer Sciences and Engineering, Vol.2, Issue.9, pp.166-177, 2014.

[21] D. Ragupathi, S. Sivaranjani, “Performance Enhanced Live Migration of Virtual Machines in the Cloud,” International Journal of Computer Sciences and Engineering, Vol.3, Issue.11, pp.94-99, 2015.

[22] Rosy, C. P. R. O. M., and R. Ponnusamy. "A Study on Hotel Reservation Trends of Mobile App via Smartphone." IOSR Journal of Computer Engineering (IOSR-JCE) 19.4 (2017): 01-08.

[23] Rajkumar, V., and V. Maniraj. "HCCLBA: Hop-By-Hop Consumption Conscious Load Balancing Architecture Using Programmable Data Planes." Webology (ISSN: 1735-188X) 18.2 (2021).

[24] Ambika, G., and P. Srivaramangai. "Encrypted Query Data Processing in Internet Of Things (IoTs): CryptDB and Trusted DB." (2018).

[25] Rajkumar, V., and V. Maniraj. "Dependency Aware Caching (Dac) For Software Defined Networks." Webology (ISSN: 1735-188X) 18.5 (2021).

[26] C.Senthil Selvi, Dr. N. Vetrivelan, “ An Efficient Information Retrieval In Mesh (Medical Subject Headings) Using Fuzzy”, Journal of Theoretical and Applied Information Technology 2019. ISSN: 1992-8645, Vol.97. No 9, Page No: 2561-2571.

[27] Rosy, C. Premila, and R. Ponnusamy. "Intelligent System to Support Judgmental Business Forecasting: The Case of Unconstraint Hotel RoomDemand in Hotel Advisory System." International Journal of Science and Research (IJSR) 4.1 (2015).

[28] C.Senthil Selvi, Dr. N. Vetrivelan, “Medical Search Engine Based On Enhanced Best First Search International Journal Of Research And Analytical Reviews (IJRAR.ORG) 2019, Volume 6, Issue 2, Page No: 248-250.

[29] D. Ragupathi and N. Jayaveeran, "Significant role of migration in virtual environment," 2016 International Conference on Emerging Trends in Engineering, Technology and Science (ICETETS), Pudukkottai, India, 2016, pp. 1-6, doi: 10.1109/ICETETS.2016.7603122.

[30] M. Dhivya, D. Ragupathi, V.R. Kumar, “Hadoop Mapreduce Outline in Big Figures Analytics,” International Journal of Computer Sciences and Engineering, Vol.2, Issue.9, pp.100-104, 2014.

Keywords

Spatiotemporal Action Recognition, Deep Learning for Video Analysis, Human Activity Recognition (HAR), Temporal Convolutional Networks (TCN), 3D Convolutional Neural Networks (3D-CNNs).

Image
  • Format Volume 13, Issue 1, No 03, 2025
  • Copyright All Rights Reserved ©2025
  • Year of Publication 2025
  • Author Dr.N.Ruba, P.K.Lavanya
  • Reference IJCS-546
  • Page No 032-041

Copyright 2025 SK Research Group of Companies. All Rights Reserved.