Efficient multi-person action recognition using yolov7-pose and deep learning models

Đăng ký

Efficient multi-person action recognition using yolov7-pose and deep learning models

Thang Trinh Dinh

Hamka Mudin Parah

Nguyen Khanh An

Nguyen Duc Manh

Tóm tắt

Recognition of multi-person action is very important for technology to study and recognize the actions of many people in one scene at the same time. Common models used for pose estimation such as OpenPose and PoseNet show good results but have slower inference speeds, which makes them less useful in situations that need real-time processing. We suggest a way to solve this problem by joining quick pose estimation skills from YOLOv7-Pose with deep learning models—Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) and Spatial Temporal-Graph Convolution Network (ST-GCN)—for classifying actions. From our experiment outcomes, we see that YOLOv7-Pose combined with ST-GCN has the topmost precision of 91%, while YOLOv7-Pose together with LSTM gives quickest testing time at 1.2 milliseconds. This indicates that the method we propose successfully maintains a balance between accuracy and efficiency, making it suitable for recognizing actions in real-time among multiple people in different applications.

Tài liệu tham khảo

Ahmad, T., Cavazza, M., Matsuo, Y., & Prendinger, H. (2022). Detecting Human Actions in Drone Images Using YoloV5 and Stochastic Gradient Boosting. Sensors, 22(18), 7020. https://doi.org/10.3390/s22187020

Dai, Y., & Liu, W. (2023). GL-YOLO-Lite: A Novel Lightweight Fallen Person Detection Model. Entropy, 25(4), 587. https://doi.org/10.3390/e25040587

Gautam, A., & Singh, S. (2021). Deep Learning Based Object Detection Combined with Internet of Things for Remote Surveillance. Wireless Personal Communications, 118(4), 2121–2140. https://doi.org/10.1007/s11277-021-08071-5

Huang, Y., & Liang, M. (2021). Spatio-temporal Attention Network for Student Action Recognition in Classroom Teaching Videos. https://doi.org/10.21203/rs.3.rs-1022972/v1

Jiang, Y., Yang, K., Zhu, J., & Qin, L. (2024). YOLO-Rlepose: Improved YOLO Based on Swin Transformer and Rle-Oks Loss for Multi-Person Pose Estimation. Electronics, 13(3), 563. https://doi.org/10.3390/electronics13030563

Li, P., Wu, F., Xue, S., & Guo, L. (2023). Study on the Interaction Behaviors Identification of Construction Workers Based on ST-GCN and YOLO. Sensors, 23(14), 6318. https://doi.org/10.3390/s23146318

Lina, W., & Ding, J. (2020). Behavior detection method of OpenPose combined with Yolo network. 2020 International Conference on Communications, Information System and Computer Engineering (CISCE), 326–330. https://doi.org/10.1109/CISCE50729.2020.00072

Mithsara, W. K. M. (2022). Comparative Analysis of AI-powered Approaches for Skeleton-based Child and Adult Action Recognition in Multi-person Environment. 2022 International Conference on Computer Science and Software Engineering (CSASE), 24–29. https://doi.org/10.1109/CSASE51777.2022.9759717

Rodrigues, N. R. P., Da Costa, N. M. C., Melo, C., Abbasi, A., Fonseca, J. C., Cardoso, P., & Borges, J. (2023). Fusion Object Detection and Action Recognition to Predict Violent Action. Sensors, 23(12), 5610. https://doi.org/10.3390/s23125610

Zhang, X., Su, X., Yu, J., Jiang, W., Wang, S., Zhang, Y., Zhang, Z., & Wang, L. (2021). Combine Object Detection with Skeleton-Based Action Recognition to Detect Smoking Behavior. 2021 The 5th International Conference on Video and Image Processing, 111–116. https://doi.org/10.1145/3511176.3511194

Tệp đính kèm

15_122-126.pdf

Efficient multi-person action recognition using yolov7-pose and deep learning models

Tóm tắt

Tài liệu tham khảo

Tệp đính kèm

Link

Lượt truy cập

Efficient multi-person action recognition using yolov7-pose and deep learning models

Tóm tắt

Tài liệu tham khảo

Tệp đính kèm

Link

Lượt truy cập

Đăng nhập

Xác thực

Đăng xuất

Quên mật khẩu

Xác thực