Time series forecasting in real-time streaming data processing

R.A. Elchenkov, M.E. Dunaev, K.S. Zaytsev


The purpose of this work is to study methods for predicting the values of time series when processing streaming data in distributed systems in real time. To do this, the authors propose a modification of the autoregressive model with a given AR order by adding to it the inheritance function of the previous values of the time series. The results of comparative experiments of the proposed modification, called Real-Time AR with classical AR and ARIMA, confirmed the effectiveness of the modification. This is especially evident in the presence of anomalies in the behavior of the real time series. The proposed modification of the algorithm allows not only to parallelize calculations, but also to configure the model on the fly in the Apache Spark ecosystem. To conduct experiments with the algorithms, a special data array was built - a data slice from 1000 measurements of the Apache Kafka server metrics log with one topic, two producers and one consumer. Anomalous fragments were artificially added to the array, differing in a large number of messages per second and/or message size. The values of the proposed data array were normalized and shifted by the average value over the training sample of the model pre-training. The results of applying the proposed algorithm in solving problems of predicting the values of time series showed that the presence of anomalies in the behavior of objects does not introduce significant distortions in the results of predicting values.

Full Text:

PDF (Russian)


Peter J Brockwell, Peter J Brockwell, Richard A Davis, and Richard A Davis. Introduction to time series and forecasting. Springer, 2016.

A. Aldweesh, A. Derhab, and A. Z. Emam, “Deep learning approaches for anomaly-based intrusion detection systems: A survey, taxonomy, and open issues, Knowledge-Based Systems, vol. 189, p. 105124, 2020

Yagmur Gizem Cinar, Hamid Mirisaee, Parantapa Goswami, ´Eric Gaussier, AliAit-Bachir, and France Vadim Strijov. Time series forecasting using rnns: anextended attention mechanism to model periods and handle miss-ing values.CoRR, abs/1703.10089, 2017.

Unified engine for large-scale data analytics https://spark.apache.org/ Reviewed 01.10.2021

Apache Hadoop https://hadoop.apache.org/ Reviewed 01.10.2021

Shumway R.H., Stoffer D.S. TimeSeries Analysis and Its Applications:With R Examples, 3rd Edition. -Springer, 2011. - 609 p

E. J. Hannan. Multiple Time Series. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons Inc., New York, 2009

Anava, Oren, et al. "Online learning for time series prediction." Conference on learning theory. PMLR, 2013.

Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Mel-bourne, Australia.

Vitaly Kuznetsov, Mehryar Mohri. Time series prediction and online learning. 29th Annual Conference on Learning Theory, PMLR 49:1190-1213, 2016.

Dimitris Fotakis, Thanasis Lianeas, Georgios Piliouras, and Stratis Skoulakis. Efficient online learning of opti-mal rankings: Dimensionality reduction via gradient descent. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 2020.

Streaming linear regression https://spark.apache.org/docs/latest/mllib-linear-methods.html#streaming-linear-regression Reviewed 01.10.2021

Clustering - RDD-based API https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means Reviewed 01.10.2021

Kozitsin V, Katser I, Lakontsev D. Online Forecasting and Anomaly Detection Based on the ARIMA Model. Applied Sciences. 2021; 11(7):3194. https://doi.org/10.3390/app11073194

Guansong Pang, Chunhua Shen, Longbing Cao, and Anton van den Hengel. 2020. Deep Learning for Anomaly Detection: A Review. ACM Comput. Surv. 1, 1, Article 1 (January 2020), 36 pages. https://doi.org/10.1145/3439950)


  • There are currently no refbacks.

Abava  Absolutech Convergent 2020

ISSN: 2307-8162