Application of Deep Neural Networks in the Problem of Obtaining Depth Maps from Two-Dimensional Images
https://doi.org/10.21869/2223-1560-2019-23-3-113-134
Abstract
Purpose of research is to study approaches to the depth map generation for deep neural networks testing and learning. The problem of obtaining information about the distance from the camera to the scenery object using a 2D image by means of deep neural networks without applying a stereocamera is considered.
Methods. Generation of 3D scenery for training and assessment of the neural network was carried out using the 3D-computer graphics application Blender. The standard deviation (RMS) was used to estimate the accuracy of learning. Machine learning was implemented using the Keras library and optimization was implemented using the AdaGrad approach.
Results. The architecture of a deep neural network which receives three sequences of 2D images from the 3D scenery video stream in the input and outputs the predicted depth map for the considered 3D scenery, is provided. The method for creating training data sets containing information about the depth of the map using Blender software is described. The problem of overtraining involving the fact that the created models work using specially generated data sets but still can not predict the correct depth map for random images is studied. The results of the testing actual methods for depth maps creation using deep neural networks are presented.
Conclusion. The main problem of the proposed method is overtraining which can be expressed in predicting a certain average value for different images or predicting the same output for different inputs. To solve this problem, we can use already trained networks or training and variation samples containing 2D images of different sceneries.
About the Authors
D. I. MihalchenkoRussian Federation
Daniil I. Mihalchenko, Post-Graduate Student, Laboratory of Autonomous Robotic Systems
A. G. Ivin
Russian Federation
Arsenii G. Ivin, Post-Graduate Student, Laboratory of Autonomous Robotic Systems
O. Yu. Sivchenko
Russian Federation
Oleg Y. Sivchenko, Programmer, Laboratory of Autonomous Robotic Systems
E. A. Aksamentov
Russian Federation
Egor A. Aksamentov, Junior Research Fellow, Laboratory of Autonomous Robotic Systems
References
1. Levonevskiy D., Vatamaniuk I., Saveliev A. Integration of corporate electronic services into a smart space using temporal logic of actions. International Conference on Interactive Collaborative Robotics, Springer, Cham 2017; 10459, pp. 134143. DOI: 10.1007/9783319664712_15.
2. Ronzhin A., Saveliev A., Basov O., Solyonyj S. Conceptual model of cyberphysical environment based on collaborative work of distributed means and mobile robots. International Conference on Interactive Collaborative Robotics. Springer, Cham 2016; 9812, pp. 3239. DOI: 10.1007/9783319439556_5.
3. Vatamaniuk I., Levonevskiy D., Saveliev A., Denisov A Scenarios of multimodal information navigation services for users in cyberphysical environment. International Conference on Speech and Computer. Springer, Cham 2016; 9811, pp. 588595. DOI: 10.1007/9783319439587_71.
4. Richards HW. Method and apparatus for user interaction for virtual measurement using a depth camera system. U.S. Patent Application no. 20170302908, 2017, no. 15(132), p. 822.
5. Liang H., Su X., Liu Y., Xu H., Wang Y., Chen X. An efficient holefilling method based on depth map in 3D view generation. 2017 International Conference on Optical Instruments and Technology: Optoelectronic Imaging/Spectroscopy and Signal Processing Technology, 2018; 10620. DOI: 10.1117/12.2293301.
6. Girshick R., Shotton J., Kohli P., Criminisi A., Fitzgibbon A. Efficient regression of generalactivity human poses from depth images. 2011 International Conference on Computer Vision, 2011, pp. 415422.
7. Sun Y., Zhang X., Xin Q., Huang, J. Developing a multifilter convolutional neural network for semantic segmentation using highresolution aerial imagery and LiDAR data. ISPRS journal of photogrammetry and remote sensing, 2018, vol. 143, pp. 314.
8. Watts K.W., Konolige K., Konolige K. Ground plane detection to verify depth sensor status for robot. US Patent no. 9886035, 2018.
9. Tee Kit Tsun M., Lau B., Siswoyo Jo H. An improved indoor robot humanfollowing navigation model using depth camera, active IR marker and proximity sensors fusion. Robotics, 2018, no. 7(1), 4 p. DOI: 0.3390/robotics7010004.
10. Gorobtsov A.S., Andreev A.E., Markov A.E., Skorikov A.V., Tarasov P.S. Osobennosti resheniya uravnenii metoda obratnoi zadachi dlya sinteza ustoichivogo upravlyaemogo dvizheniya shagayushchikh robotov [Features of solving the inverse dynamic method equations for the synthesis of stable walking robots controlled motion]. Trudy SPIIRAN = SPIIRAS Proceedings, 2019, no. 18, pp. 85122. DOI: 10.15622/sp.18.1.85122 (In Russ.).
11. Altukhov V.G., Kolker A.B. Vychislenie rasstoyaniya do ob"ekta na osnove karty glubin poluchennoi metodom zerkal'nogo razdeleniya izobrazhenii [Calculation of the distance to the object based on the depth map obtained by the method of mirror image separation]. Avtomatika i programmnaya inzheneriya = Automation and Software Engineering, 2017, no. 1, pp. 6569 (In Russ.).
12. Erofeev M., Vatolin D. Mnogosloinoe reshenie problemy poluprozrachnykh granits pri postroenii stereoskopicheskikh izobrazhenii [A multilayered solution to the problem of translucent boundaries when building stereoscopic images]. International Journal of Open Information Technologies, 2016, no.4(8) (In Russ.).
13. Lin K.Y., Hang H.M. Depth Map Enhancement on RGBD Video Captured by Kinect V2. 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2018, pp. 15301535.
14. Ulyanov S.V., Reshetnikov A.G., Koshelev K.V. Razrabotka sistemy stereozreniya dlya mobil'nogo robota [Development of a stereo vision system for a mobile robot]. Programmnye produkty i sistemy = Software products and systems. 2017; no. 30(3) (In Russ.).
15. Koch T. Evaluation of CNNbased singleimage depth estimation methods. Proceedings of the European Conference on Computer Vision (ECCV), 2018.
16. Eigen D., Puhrsch C., Fergus R. Depth map prediction from a single image using a multiscale deep network. Advances in Neural Information Processing Systems, 2014, pp. 23662374. arXiv: 1406.2283v1.
17. Liu F., Shen C., Lin G., Deep convolutional neural fields for depth estimation from a single. The IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 51625170. DOI: 10.1109/ CVPR.2015.7299152.
18. Laina I., Rupprecht C., Belagiannis V., Tombari F., Navab N. Deeper depth prediction with fully convolutional residual networks. CoRR, abs/1606.00373. 2016. DOI: 10.1109/3dv.2016.32. arXiv: 1606.00373.
19. Li J., Klein R., Yao A. A twostreamed network for estimating finescaled depth maps from single rgb images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3372–3380.
20. Soldatova O.P., Lyozin I.A., Lyozina I.V., Kupriyanov A.V., Kirsh D.V. Primenenie nechetkikh neironnykh setei dlya opredeleniya tipa kristallicheskikh reshetok, nablyudaemykh na nanomasshtabnykh izobrazheniyakh [Application of fuzzy neural networks for defining crystal lattice types in nanoscale images]. Komp'yuternaya optika = Computer Optics. 2015; no. 39(5): 787795. (In Russ.). DOI: 10.18287/013424522015395787794.
21. Jalalvand A., Demuynck K., Neve W.D., Martensa J.P. On the application of reservoir computing networks for noisy image recognition. Neurocomputing. 2018; 277: 237248. DOI: 10.1016/j.neucom.2016.11.100.
22. Dutta S., Manideep B.CS., Basha S.M., Caytiles R.D., Iyengar N.Ch.S.N. Classification of diabetic retinopathy images by using deep learning models. International Journal of Grid and Distributed Computing. 2018; no. 11(1), pp. 89106. DOI: 10.14257/ ijgdc.2018.11.1.09.
23. Sirota A.A., Dryuchenko M.A. Obobshchennye algoritmy szhatiya izobrazhenii na fragmentakh proizvol'noi formy i ikh realizatsiya s ispol'zovaniem iskusstvennykh neironnykh setei [Generalized image compression algorithms for arbitrarilyshaped fragments and their implementation using artificial neural networks]. Komp'yuternaya optika = Computer Optics. 2015; 39(5): 751761. (In Russ.). DOI: 10.18287/013424522015395751761.
24. Nikonorov A.V., Petrov M.V., Bibikov S.A., Kutikova V.V., Morozov A.A., Kazanskij N.L. Rekonstruktsiya izobrazhenii v difraktsionnoopticheskikh sistemakh na osnove svertochnykh neironnykh setei i obratnoi svertki [Image restoration in diffractive optical systems using deep learning and deconvolution]. Komp'yuternaya optika = Computer Optics. 2017; no. 41(6), pp. 875887. (In Russ.). DOI: 10.18287/241261792017416875887.
25. Oleinik A.L., Kukharev, G.A. Algoritmy vzaimnoi rekonstruktsii izobrazhenii lits na osnove metodov proektsii v sobstvennye podprostranst [ Algorithms for Face Image Mutual Reconstruction by Means of TwoDimensional Projection Methods]. Trudy SPIIRAN = SPIIRAS Proceedings. 2018; 2(57): 4574. (In Russ.). DOI: 10.15622/sp.57.3.
26. Silberman N, Fergus R. Indoor scene segmentation using a structured light sensor. Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on; 2011, pp. 601608.
27. Hu J., Ozay M., Zhang Y., Okatani T. Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, pp. 10431051.
28. Zhu J. Ma R. (2016), Realtime depth estimation from 2D images. Available at: http://cs231n.stanford.edu/reports/2016/pdfs/407_Report.pdf (accessed May 1, 2018).
29. Simonyan K., Zisserman A. Very deep convolutional networks for largescale image recognition. 3rd International Conference on Learning Representations (ICLR), 2015, pp. 114. arXiv: 1409.1556.
30. He L., Wang G., Hu Z. Learning depth from single images with deep neural network embedding focal length. IEEE Transactions on Image Processing, 2018, no. 27(9), pp. 46764689.
31. Liu M., Salzmann M., He X. Discretecontinuous depth estimation from a single image. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 716723. DOI: 10.1109 / CVPR.2014.97.
32. Luo W., Schwing A.G., Urtasun R. Efficient deep learning for stereo matching. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 56955703. DOI: 10.1109/CVPR.2016.614.
33. Eigen D., Fergus R. Predicting depth, surface normals and semantic labels with a common multiscale convolutional architecture. The IEEE International Conference on Computer Vision (ICCV), 2015, pp. 26502658. DOI: 10.1109 / ICCV.2015.304.
34. Geiger A., Lenz A., Stiller C., Urtasun R. Vision meets robotics: the KITTI dataset. The International Journal of Robotics Research, 2013, vol. 32, no. 11, pp. 12311237. DOI: 10.1177/0278364913491297.
35. Saxena A., Sun M., Ng A. Y. Make3d: learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, no. 31(5), pp. 824840. DOI: 10.1109 / TPAMI.2008.132.
36. Bogart R., Kainz F., Hess D. OpenEXR image file format. ACM SIGGRAPH 2003, Sketches & Applications, 2003.
37. Kent B.R. 3D Scientific Visualization with Blender, Morgan & Claypool, San Rafael, 2015.
38. Valenza E. Blender 2.6 Cycles: Materials and Textures Cookbook – Third Edition. Packt Publishing Ltd. Birmingham, Mumbai, 2013, 280 p.
39. Saxena A., Chung S.H., Andrew Y.Ng. Learning depth from single monocular images. Neural Information Processing Systems (NIPS), 2005, pp. 11611168. DOI: 10.1109/TPAMI.2015.2505283.
40. Saxena A., Chung S.H., Andrew Y. Ng. 3D depth reconstruction from a single still image. International Journal of Computer Vision, 2008, no. 76(1), pp. 5369. DOI: 10.1109 / TPAMI.2008.132.
41. Glorot X., Bengio Y. Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010, no. 9, pp. 249256.
42. Heaton J. Artificial Intelligence For Humans: Deep Learning and Neural Networks, Heaton Research. Inc., St Louis, MO 2015; no. 3.
43. He K., Zhang X., Ren S., Sun J. Delving deep into rectifiers: surpassing humanlevel performance on imagenet. Proceedings of the IEEE international conference on computer vision, 2015; 10261034. DOI: 10.1109 / ICCV.2015, 123 p.
44. Keras library. Available at: https://keras.io/ (acceessed 31.08.2018).
45. Backends–TensorFlow or Theano. Available at: www.tensorflow.org/ (acceessed 31.08.2018).
46. Abadi M., Barham P., Chen J., Chen Z., Davis A., Dean J., Devin M., Ghemawat S., Irving G., Isard M., Kudlur M., Levenberg J., Monga R., Moore S., Murray D.G., Steiner B., Tucker P., Vasudevan V., Warden P., Wicke M., Yu Y., Zheng X., Brain G. TensorFlow: a system for largescale machine learning. The 12th USENIX Symposium on Operating Systems Designand Implementation (OSDI ’16), Nov. 24, 2016; pp. 265283.
47. Abadi M., Agarwal A., Barham P., Brevdo E., Chen Z., Citro C., Corrado G.S., Davis A., Dean J., Devin M., Ghemawat S., Goodfellow I., Harp A., Irving G., Isard M., Jia Y., Jozefowicz R., Kaiser L., Kudlur M., Levenberg J., Mane D., Monga R., Moore S., Murray D., Olah C., Schuster M., Shlens J., Steiner B., Sutskever I., Talwar K., Tucker P., Vanhoucke V., Vasudevan V., Viegas F., Vinyals O., Warden P., Wattenberg M., Wicke M., Yu Y., Zheng X. Tensorflow: largescale machine learning on heterogeneous distributed systems. (2016b). Available at: https://arxiv.org/pdf/ 1603.04467.pdf (accessed May 1, 2018).
48. Duchi J., Hazan E., Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research. July 2011; 12: pp. 21212159.
49. Ashiquzzaman A., Tushar A.K., Islam MdR., Shon D., Im K., Park J.H, Lim D.S, Kim J, Reduction of overfitting in diabetes prediction using deep learning neural network. 2017 IT Convergence and Security, Springer, Singapore 2018; 449, pp. 3543. DOI: 10.1007/9789811064517_5.
Review
For citations:
Mihalchenko D.I., Ivin A.G., Sivchenko O.Yu., Aksamentov E.A. Application of Deep Neural Networks in the Problem of Obtaining Depth Maps from Two-Dimensional Images. Proceedings of the Southwest State University. 2019;23(3):113-134. (In Russ.) https://doi.org/10.21869/2223-1560-2019-23-3-113-134