Enhancement of the Oculographic Spatial Monitoring Method Using a Multimodal CNN with Geometric Features
Abstract
This paper presents a cost-efficient and reproducible approach to eye-tracking using a standard webcam and a multimodal convolutional neural network (CNN). Unlike traditional eye-tracking systems (such as EyeLink or Tobii), which require expensive hardware, the proposed solution relies on a combination of visual and geometric features, as well as a specially designed neural architecture that integrates four convolutional branches for image processing and three fully connected branches for numerical data processing. The method also mitigates common limitations related to lighting conditions, head position, and individual user characteristics. A multimodal dataset of 10 GB was created for model training, containing 2 million eye images and corresponding geometric annotations. After hyperparameter optimization using Ray Tune and the ASHA algorithm, our model achieved a mean RMSE error of 30.23 pixels, representing a 61% improvement over the previous version of the method. The inference time was 15.2ms (≈65 FPS), making the system suitable for real-time applications. Comparison with professional eye-trackers (EyeLink 1000 Plus, AdHawk MindLink) showed that the proposed model achieves intermediate accuracy while requiring significantly less expensive equipment. The obtained results demonstrate the potential of multimodal neural networks for oculography in usability testing, UX analytics, and human–computer interaction research.
Full Text:
PDF (Russian)References
K. Moran, Usability (User) Testing 101. 2019. Available: https://www.nngroup.com/articles/usability-testing-101/.
E. Banuelos-Lozoya, G. Gonzalez-Serna, N. Gonzalez-Franco, O. Fragoso-Diaz and N. Castro-Sanchez, “A systematic review for cognitive state-based QoE/UX evaluation,” Sensors, No. 21(10), article 3439, 2021.
A. Onur and Y. Yang, “Using eye trackers for usability evaluation of health information technology: a systematic literature review,” JMIR human factors, No. 2.1, article e4062, 2015.
D. Szekely, S. Vert, O. Rotaru and D. Andone, “Usability evaluation with eye tracking: The case of a mobile augmented reality application with historical images for urban cultural heritage,” Heritage, No. 6(3), pp. 3256-3270, 2023.
W. Cornelissen, E. M. Peters and J. Palmer, “The Eyelink Toolbox: eye tracking with MATLAB and the Psychophysics Toolbox,” Behavior Research Methods, Instruments, & Computers, Vol. 34, No. 4, pp. 613-617, 2002.
D. C. Niehorster, R. S. Hessels and J. S. Benjamins, “GlassesViewer: Open-source software for viewing and analyzing data from the Tobii Pro Glasses 2 eye tracker,” Behavior Research Methods, Vol. 52, No. 3, pp. 1244-1253, 2020.
I. Kolidov and M. Bakaev, “Discount Eye-Tracking for Usability Testing with Webcam, CNN, and the Selected Training Data,” In 2024 IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering (PIERE), IEEE, pp. 680-685, 2024.
F. Davila, F. Paz and A. Moquillaza, “Usage and application of heatmap visualizations on usability user testing: A systematic literature review,” In International Conference on Human-Computer Interaction, Cham , Springer, pp. 3-17, 2023.
M. García and S. Cano, “Eye tracking to evaluate the user experience (UX): Literature review,” In International Conference on Human-Computer Interaction, Cham, Springer International Publishing, pp. 134-145, 2022.
Y. LeCun and Y. Bengio, “Hinton G. Deep learning,” Nature, Vol. 521, No. 7553, pp. 436-444, 2015.
J. Deng, W. Dong, R. Socher, L. J. Li, K. Li and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” In IEEE conference on computer vision and pattern recognition, pp. 248-255, 2009.
K. He, X. Zhang, S. Ren and J. Sun, “Deep residual learning for image recognition,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint, arXiv:1409.1556, 2014.
J. Redmon, S. Divvala, R. Girshick and A. Farhadi, “You only look once: Unified, real-time object detection,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016.
F. Rocca, M. Mancas and B. Gosselin, “Head pose estimation by perspective-n-point solution based on 2d markerless face tracking,” In International Conference on Intelligent Technologies for Interactive Entertainment, Cham, Springer International Publishing, pp. 67-76, 2014.
Holmqvist, S. L. Örbom, I. T. Hooge, D. C. Niehorster, R. G. Alexander, R. Andersson and R. S. Hessels, “Eye tracking: empirical foundations for a minimal reporting guideline,” Behavior research methods, Vol. 55, No. 1, pp. 364-416, 2023.
Y. Kartynnik, A. Ablavatski, I. Grishchenko and M. Grundmann, “Real-time facial surface geometry from monocular video on mobile GPUs”, arXiv preprint, arXiv:1907.06724, 2019.
Z. S. Kadhim, H. S. Abdullah and K. I. Ghathwan, “Artificial Neural Network Hyperparameters Optimization: A Survey,” Int. J. Online Biomed. Eng, Vol. 18, No. 15l, pp. 59-87, 2022.
T. Bartz-Beielstein, “PyTorch Hyperparameter Tuning-A Tutorial for spotPython”, arXiv preprint, arXiv:2305.11930, 2023.
D. N. Polyakov and M. M. Stepanova, “Hyperparameter tuning of neural network for high-dimensional problems in the case of Helmholtz equation,” Moscow University Physics Bulletin, Vol. 78, No. Suppl 1, pp. S243-S255, 2023.
R. Schmucker, M. Donini, M. B. Zafar, D. Salinas and C. Archambeau, “Multi-objective asynchronous successive halving,” arXiv preprint, arXiv:2106.12639, 2021.
K. Das, J. Jiang and J. N. K. Rao, Mean squared error of empirical predictor, pp. 818-840, 2004.
T. O. Hodson, “Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not,” Geoscientific Model Development Discussions, pp. 1-10, 2022.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность ИТ конгресс СНЭ
ISSN: 2307-8162