Highly Accurate XSS Detection using CatBoost

Abdulkader Hajjouz, Elena Avksentieva

Abstract


Cross Site Scripting (XSS) is still a big security threat to web and user data. We need advanced detection mechanisms to protect web applications. This paper presents a new machine learning framework for XSS detection. We use hierarchical feature selection with Spearman correlation to reduce feature dimension and improve model interpretability and CatBoostClassifier, a gradient boosting algorithm known for its robustness and performance. We tested our CatBoost-based model on a large dataset and it got 99.88% accuracy, 1.00 ROC AUC and 1.00 Average Precision and 0.9974 Matthews Correlation Coefficient. Compared to existing XSS detection methods, our proposed framework outperforms the benchmark models on all the metrics. Feature importance and SHAP value analysis also shows the important features for XSS classification. This paper proves our integrated approach is effective and a good solution for XSS mitigation in web applications.

Full Text:

PDF

References


Nair, S. S. (2024). Securing Against Advanced Cyber Threats: A Comprehensive Guide to Phishing, XSS, and SQL Injection Defense. Journal of Computer Science and Technology Studies, 6(1), 76-93.

Rodríguez, G. E., Torres, J. G., Flores, P., & Benavides, D. E. (2020). Cross-site scripting (XSS) attacks and mitigation: A survey. Computer Networks, 166, 106960.

Hannousse, A., Yahiouche, S., & Nait-Hamoud, M. C. (2024). Twenty-two years since revealing cross-site scripting attacks: a systematic mapping and a comprehensive survey. Computer Science Review, 52, 100634.

Kaur, J., Garg, U., & Bathla, G. (2023). Detection of cross-site scripting (XSS) attacks using machine learning techniques: a review. Artificial Intelligence Review, 56(11), 12725-12769.

Chinese Twitter hit by XSS worm. 2022. https://news.softpedia.com/news/ Chinese-Twitter-Hit-by-XSS-Worm-209292.shtml. (accessed on 2 January 2024).

Digging Experience | constructing twitter XSS worm from twitter’s XSS vulnerability. 2022. https://www.freebuf.com/vuls/203052.html. (accessed on 2 January 2024).

The 2021 hacker report. 2022. https://www.hackerone.com/resources/reporting/ the-2021-hacker-report.(accessed on 25 December 2023).

Acunetix. 2021. The Invicti AppSEC Indicator Spring 2021 edition: Acunetix Web Vulnerability Report. Acunetix. Retrieved from https://www.acunetix.com/white-papers/acunetix-web-application-vulnerability-report-2021/. (accessed on 3 January 2024).

Fang, Y., Li, Y., Liu, L., & Huang, C. (2018, March). DeepXSS: Cross site scripting detection based on deep learning. In Proceedings of the 2018 international conference on computing and artificial intelligence (pp. 47-51).

Guan, H., Li, D., Li, H., & Zhao, M. (2022, December). A Crawler-Based Vulnerability Detection Method for Cross-Site Scripting Attacks. In 2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C) (pp. 651-655). IEEE.

Kumar, J. H., & Ponsam, J. G. (2023, January). Cross site scripting (XSS) Vulnerability detection using machine learning and statistical analysis. In 2023 International Conference on Computer Communication and Informatics (ICCCI) (pp. 1-9). IEEE.

Mereani, F. A., & Howe, J. M. (2018, January). Detecting cross-site scripting attacks using machine learning. In International conference on advanced machine learning technologies and applications (pp. 200-210). Cham: Springer International Publishing.

Mereani, F., & Howe, J. M. (2019). Exact and approximate rule extraction from neural networks with Boolean features. In Proceedings of the 11th International Joint Conference on Computational Intelligence (Vol. 1, pp. 424-433). SCITEPRESS-Science and Technology Publications.

Kascheev, S., & Olenchikova, T. (2020, November). The detecting cross-site scripting (XSS) using machine learning methods. In 2020 global smart industry conference (GloSIC) (pp. 265-270). IEEE.

Chen, H. C., Nshimiyimana, A., Damarjati, C., & Chang, P. H. (2021, January). Detection and prevention of cross-site scripting attack with combined approaches. In 2021 International conference on electronics, information, and communication (ICEIC) (pp. 1-4). IEEE.

Rodríguez-Galán, G., & Torres, J. (2024). Personal data filtering: a systematic literature review comparing the effectiveness of XSS attacks in web applications vs cookie stealing. Annals of Telecommunications, 1-40.

Liu, M., Zhang, B., Chen, W., & Zhang, X. (2019). A survey of exploitation and detection methods of XSS vulnerabilities. IEEE access, 7, 182004-182016.

Alenzi, K. F., & Abbase, O. A. B. (2022). A Defensive Framework for Reflected XSS in Client-Side Applications. Journal of Web Engineering, 21(7), 2209-2229.

Anagandula, K., & Zavarsky, P. (2020, June). An analysis of effectiveness of black-box web application scanners in detection of stored SQL injection and stored XSS vulnerabilities. In 2020 3rd International Conference on Data Intelligence and Security (ICDIS) (pp. 40-48). IEEE.

Bensalim, S., Klein, D., Barber, T., & Johns, M. (2021, April). Talking about my generation: Targeted dom-based xss exploit generation using dynamic data flow analysis. In Proceedings of the 14th European Workshop on Systems Security (pp. 27-33).

Giménez, C. T., Villegas, A. P., & Marañón, G. Á. (2010). HTTP data set CSIC 2010. Information Security Institute of CSIC (Spanish Research National Council), 64, 07.

Wang, H., Lu, Y., & Zhai, C. (2011, August). Latent aspect rating analysis without aspect keyword supervision. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 618-626).

Rustam, F., Raza, A., Ashraf, I., & Jurcut, A. D. (2023, June). Deep ensemble-based efficient framework for network attack detection. In 2023 21st Mediterranean Communication and Computer Networking Conference (MedComNet) (pp. 1-10). IEEE.

OWASP Top Ten. OWASP Foundation. Retrieved from https://owasp.org/www-project-top-ten/. (accessed on 2 January 2024).

Mukherjee, M., & Khushi, M. (2021). SMOTE-ENC: A novel SMOTE-based method to generate synthetic data for nominal and continuous features. Applied system innovation, 4(1), 18.

Dorogush, A. V., Ershov, V., & Gulin, A. (2018). CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363.

Syarif, I., Prugel-Bennett, A., & Wills, G. (2016). SVM parameter optimization using grid search and genetic algorithm to improve classification performance. TELKOMNIKA (Telecommunication Computing Electronics and Control), 14(4), 1502-1509.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность ИТ конгресс СНЭ

ISSN: 2307-8162