Identification of a Person by Gait in a Video Stream

M. Yu. Uzdiaev; R. N. Iakovlev; D. M. Dudarenko; A. D. Zhebrun

doi:10.21869/2223-1560-2020-24-4-57-75

Identification of a Person by Gait in a Video Stream

M. Yu. Uzdiaev, R. N. Iakovlev, D. M. Dudarenko, A. D. Zhebrun

https://doi.org/10.21869/2223-1560-2020-24-4-57-75

Full Text:

PDF (Rus)

Generate QR code

Abstract

Purpose of research. The given paper considers the problem of identifying a person by gait through the use of neural network recognition models focused on working with RGB images. The main advantage of using neural network models over existing methods of motor activity analysis is obtaining images from the video stream without frames preprocessing, which increases the analysis time.
Methods. The present paper presents an approach to identifying a person by gait. The approach is based upon the idea of multi-class classification on video sequences. The quality of the developed approach operation was evaluated on the basis of CASIA Gait Database data set, which includes more than 15,000 video sequences. As classifiers, 5 neural network architectures have been tested: the three-dimensional convolutional neural network I3D, as well as 4 architectures representing convolutional-recurrent networks, such as unidirectional and bidirectional LTSM, unidirectional and bidirectional GRU, combined with the convolutional neural network of ResNet architecture being used in these architectures as a visual feature extractor.
Results. According to the results of the conducted testing, the developed approach makes it possible to identify a person in a video stream in real-time mode without the use of specialized equipment. According to the results of its testing and through the use of the neural network models under consideration, the accuracy of human identification was more than 80% for convolutional-recurrent models and 79% for the I3D model.
Conclusion. The suggested models based on I3D architecture and convolutional-recurrent architectures have shown higher accuracy for solving the problem of identifying a person by gait than existing methods. Due to the possibility of frame-by-frame video processing, the most preferred classifier for the developed approach is the use of convolutional-recurrent architectures based on unidirectional LSTM or GRU models, respectively.

Keywords

neural networks, computer vision, convolutional neural networks, recurrent neural networks, I3D, human identification techniques

About the Authors

M. Yu. Uzdiaev

St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences
Russian Federation

Mikhail Yu. Uzdiaev, Junior Researcher of Laboratory of Big Data in Socio-Cyberphysical Systems

39, 14th Line, St. Petersburg 199178

R. N. Iakovlev

St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences
Russian Federation

Roman N. Iakovlev, Junior Researcher of Laboratory of Big Data in Socio-Cyberphysical Systems

39, 14th Line, St. Petersburg 199178

D. M. Dudarenko

St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences
Russian Federation

Dmitry М. Dudarenko, Junior Researcher of Laboratory of Big Data in Socio-Cyberphysical Systems

39, 14th Line, St. Petersburg 199178

A. D. Zhebrun

St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences
Russian Federation

Aleksandr D. Zhebrun, Programmer of Laboratory of Big Data in Socio-Cyberphysical Systems

39, 14th Line, St. Petersburg 199178

References

1. Sherstobitov A.I., Fedosov V.P., Prihodchenko V.A., Timofeev D.V. Raspoznavanie lits na gruppovykh fotografiyakh s ispol'zovaniem algoritmov segmentatsii [Face recognition on groups photos with using segmentation algorithms]. Izvestiya Yuzhnogo federal'nogo universiteta. Tekhnicheskie nauki = Bulletin of the Southern Federal University. Technical science, 2013, no. 11(148) (In Russ.). Available at: https://cyberleninka.ru/article/n/raspoznavanie-litsna-gruppovyh-fotografiyah-s-ispolzovaniem-algoritmov-segmentatsii

2. Sokolova A., Konushin A. Gait recognition based on convolutional neural networks. International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences, 2017; XLII-2/W4, pp. 207-212. https://doi.org/isprs-archives-XLII-2-W4-207-2017

3. Sokolova A., Konushin A. Pose-based deep gait recognition. IET Biometrics, 2018, no. 8(2), pp. 134-143. https://doi.org/10.1049/iet-bmt.2018.5046

4. Han J., Bhanu B. Individual recognition using gait energy image. IEEE transactions on pattern analysis and machine intelligence, 2005, no. 28(2), pp. 316-322. https://doi.org/10.1109/TPAMI.2006.38

5. Liutov V., Konushin A., Arseev S. Raspoznavanie cheloveka po pokhodke i vneshnosti [Human recognition by appearance and gait]. Programmirovanie = Programming and Computer Software, 2018, no. 44(4), pp. 258-265 (In Russ.). https://doi.org/10.31857/S000523100000515-0

6. Sokolova A.I., Konushin A.S. Metody identifikatsii cheloveka po pokhodke v video [Methods of gait recognition in video]. Trudy Instituta sistemnogo programmirovaniya RAN = Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS), 2019, no. 31(1), pp. 69-82 (In Russ.). https://doi.org/10.15514/ISPRAS-2019-31(1)-5

7. Alotaibi M., Mahmood A. Improved gait recognition based on specialized deep convolutional neural network. Computer Vision and Image Understanding, 2017, no. 164, pp. 103-110. https://doi.org/10.1016/j.cviu.2017.10.004

8. Malashin R.O., Lutsiv V.R. Vosstanovlenie silueta ruki v zadache raspoznavaniya zhestov s pomoshch'yu adaptivnoi morfologicheskoi fil'tratsii binarnogo izobrazheniya [Restoring a silhouette of the hand in the problem of recognizing gestures by adaptive morphological filtering of a binary image]. Opticheskii zhurnal = Journal of Optical, 2013, no. 80(11), pp. 54-61 (In Russ.). https://doi.org/10.1364/JOT.80.000685

9. Chen C., Liang J., Zhao H., Hu H., Tian J. Frame difference energy image for gait recognition with incomplete silhouettes. Pattern Recognition Letters, 2009, no. 30(11), pp. 977-984. https://doi.org/10.1016/j.patrec.2009.04.012

10. Castro F.M., Marín-Jimenez M.J., Medina-Carnicer R. Pyramidal Fisher Motion for Multiview Gait Recognition. 2014 22nd International Conference on Pattern Recognition, Stockholm, 2014, pp. 1692-1697. https://doi.org/doi:10.1109/ICPR.2014.298

11. Kaaniche M.B., Bremond F. Tracking hog descriptors for gesture recognition. 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, 2009, pp. 140-145. https://doi.org/10.1109/AVSS.2009.26

12. Uijlings J.R.R., Duta I.C., Rostamzadeh N., Sebe N. Realtime video classification using dense hof/hog. Proceedings of international conference on multimedia retrieval, 2014, pp. 145-152. https://doi.org/10.1145/2578726.2578744

13. Feng Y., Li Y., Luo J. Learning effective gait features using LSTM. 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2016, pp. 325-330. https://doi.org/0.1109/ICPR.2016.7899654

14. Hochreiter S., Schmidhuber J. Long short-term memory. Neural computation, 1997, no, 9(8), pp.1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735

15. Tran D., Bourdev L., Fergus R., Torresani L., Paluri M. Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE international conference on computer vision, 2015, pp. 4489-4497. https://doi.org/10.1109/ICCV.2015.510

16. Carreira J., Zisserman A. Quo vadis, action recognition? a new model and the kinetics dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299-6308. https://doi.org/10.1109/CVPR.2017.502K

17. Hara K., Kataoka H., Satoh Y. Learning spatio-temporal features with 3D residual networks for action recognition. Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 3154-3160. https://doi.org/10.1109/ICCVW.2017.373

18. Hara K., Kataoka H., Satoh Y. Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 6546-6555. https://doi.org/10.1109/CVPR.2018.00685

19. Saveliev A., Uzdiaev M., Dmitrii M. Aggressive Action Recognition Using 3D CNN Architectures. 2019 12th International Conference on Developments in eSystems Engineering (DeSE). IEEE, 2019, pp. 890-895. https://doi.org/10.1109/10.1109/DeSE.2019.00165

20. Cho K., Van Merriënboer B., Gulcehre C., Bahdanau D., Bougares F., Schwenk H., Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. 2014. Available at: https://arxiv.org/abs/1406.1078

21. Yue-Hei Ng J., Hausknecht M., Vijayanarasimhan S., Vinyals O., Monga R., Toderici G. Beyond short snippets: Deep networks for video classification. Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp.4694-4702. https://doi.org/10.1109/CVPR.2015.7299101

22. He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770- 778. https://doi.org/10.1109/CVPR.2016.90

23. Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., Erhan D., Vanhoucke V., Rabinovich A. Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9. https://doi.org/10.1109/CVPR.2015.7298594

24. What is Log Loss? [Quoted May 6, 2020]. Available at: https://www.kaggle.com/dansbecker/what-is-log-loss

25. Kingma D.P., Ba J. Adam: A method for stochastic optimization // arXiv preprint arXiv:1412.6980. 2014. Available at: https://arxiv.org/abs/1412.6980

26. Logsoftmax vs softmax [Quoted May 6, 2020]. Available at: https://discuss.pytorch.org/t/logsoftmax-vs-softmax/21386

27. Wu Z., Huang Y., Wang L., Wang X., Tan T. A comprehensive study on cross-view gait based human identification with deep CNNS. IEEE transactions on pattern analysis and machine intelligence, 2016, no. 39(2), pp. 209-226. https://doi.org/10.1109/TPAMI.2016.2545669

28. Yu S., Chen H., Wang Q., Shen L., Huang Y. Invariant feature extraction for gait recognition using only one uniform model. Neurocomputing. 2017, no. 239, pp. 81-93. https://doi.org/10.1016/j.neucom.2017.02.006

29. Yu S., Chen H., Reyes E. B. G., Poh, N. GaitGAN: Invariant Gait Feature Extraction Using Generative Adversarial Network. In Proc. of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2017, pp. 532-539. https://doi.org/10.1109/CVPRW.2017.80

Review

For citations:

Uzdiaev M.Yu., Iakovlev R.N., Dudarenko D.M., Zhebrun A.D. Identification of a Person by Gait in a Video Stream. Proceedings of the Southwest State University. 2020;24(4):57-75. (In Russ.) https://doi.org/10.21869/2223-1560-2020-24-4-57-75

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2223-1560 (Print)
ISSN 2686-6757 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Proceedings of the Southwest State University

Identification of a Person by Gait in a Video Stream

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy