Efficient multi-person action recognition using yolov7-pose and deep learning models
Tóm tắt
Recognition of multi-person action is very important for technology to study and recognize the actions of many people in one scene at the same time. Common models used for pose estimation such as OpenPose and PoseNet show good results but have slower inference speeds, which makes them less useful in situations that need real-time processing. We suggest a way to solve this problem by joining quick pose estimation skills from YOLOv7-Pose with deep learning models—Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) and Spatial Temporal-Graph Convolution Network (ST-GCN)—for classifying actions. From our experiment outcomes, we see that YOLOv7-Pose combined with ST-GCN has the topmost precision of 91%, while YOLOv7-Pose together with LSTM gives quickest testing time at 1.2 milliseconds. This indicates that the method we propose successfully maintains a balance between accuracy and efficiency, making it suitable for recognizing actions in real-time among multiple people in different applications.
Tài liệu tham khảo
Ahmad, T., Cavazza, M., Matsuo, Y., & Prendinger, H. (2022). Detecting Human Actions in Drone Images Using YoloV5 and Stochastic Gradient Boosting. Sensors, 22(18), 7020. https://doi.org/10.3390/s22187020
Dai, Y., & Liu, W. (2023). GL-YOLO-Lite: A Novel Lightweight Fallen Person Detection Model. Entropy, 25(4), 587. https://doi.org/10.3390/e25040587
Gautam, A., & Singh, S. (2021). Deep Learning Based Object Detection Combined with Internet of Things for Remote Surveillance. Wireless Personal Communications, 118(4), 2121–2140. https://doi.org/10.1007/s11277-021-08071-5
Huang, Y., & Liang, M. (2021). Spatio-temporal Attention Network for Student Action Recognition in Classroom Teaching Videos. https://doi.org/10.21203/rs.3.rs-1022972/v1
Jiang, Y., Yang, K., Zhu, J., & Qin, L. (2024). YOLO-Rlepose: Improved YOLO Based on Swin Transformer and Rle-Oks Loss for Multi-Person Pose Estimation. Electronics, 13(3), 563. https://doi.org/10.3390/electronics13030563
Li, P., Wu, F., Xue, S., & Guo, L. (2023). Study on the Interaction Behaviors Identification of Construction Workers Based on ST-GCN and YOLO. Sensors, 23(14), 6318. https://doi.org/10.3390/s23146318
Lina, W., & Ding, J. (2020). Behavior detection method of OpenPose combined with Yolo network. 2020 International Conference on Communications, Information System and Computer Engineering (CISCE), 326–330. https://doi.org/10.1109/CISCE50729.2020.00072
Mithsara, W. K. M. (2022). Comparative Analysis of AI-powered Approaches for Skeleton-based Child and Adult Action Recognition in Multi-person Environment. 2022 International Conference on Computer Science and Software Engineering (CSASE), 24–29. https://doi.org/10.1109/CSASE51777.2022.9759717
Rodrigues, N. R. P., Da Costa, N. M. C., Melo, C., Abbasi, A., Fonseca, J. C., Cardoso, P., & Borges, J. (2023). Fusion Object Detection and Action Recognition to Predict Violent Action. Sensors, 23(12), 5610. https://doi.org/10.3390/s23125610
Zhang, X., Su, X., Yu, J., Jiang, W., Wang, S., Zhang, Y., Zhang, Z., & Wang, L. (2021). Combine Object Detection with Skeleton-Based Action Recognition to Detect Smoking Behavior. 2021 The 5th International Conference on Video and Image Processing, 111–116. https://doi.org/10.1145/3511176.3511194
© 2023 DNTU. All rights reserved.