Assessment of the Distortion Propagation in Data Products in the Context of a Relational Data Mesh Architecture
Abstract
With a decentralized approach to data management named Data Mesh, interaction with data occurs at the level of individual data products. If errors or poor-quality data are used during the preparation of data products, their distribution can significantly increase across replicas, datamarts, services, and etc. This paper examines the quality of a data product in a relational Data Mesh architecture under conditions of detected input data distortions. An approach of constructing such an assessment based on representing data transformations within a data product as a chain of elementary relational algebra operations, such as extended projection, filtering, union, and Cartesian product, and analyzing the propagation of input data distortions as a result of applying these operations is proposed. A concept for "labeling" the source relation is presented, allowing tuples to be divided into types based on the nature of the distortions detected in them, as well as algorithms aimed at tracking the transformation of tuples of different types as a result of applying the above elementary operations. A complex algorithm is developed that implements the solution to the considered data product quality assessment problem. As part of the testing, an experimental setup was developed using the Open University open data set and the proposed algorithm was implemented. The experimental results confirmed that the data product quality assessment algorithm can be successfully used within the context of a relational Data Mesh architecture. In conclusion, several practical applications of this assessment for effective data quality management within the Data Mesh concept are proposed.
Full Text:
PDF (Russian)References
Bode J., Kühl N., Kreuzberger D., Holtmann C. “Toward Avoid-ing the Data Mess: Industry Insights From Data Mesh Imple-mentations” IEEE Access, 2024, no.12.
Goedegebuure A., Kumara I., Driessen S., Van Den Heuvel W.J., Monsieur G., Tamburri D.A., Nucci D.D. “Data Mesh: A Systematic Gray Literature Review” ACM Computing Surveys, 2024, no. 57(1), pp.1–36.
Dehghani Z. “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh”, 2019, available at:
https://martinfowler.com/articles/data–monolith–to–mesh.html
Dolhopolov A., Castelltort A., Laurent A. “Implementing feder-ated governance in data mesh architecture” Future Internet, 2024, no. 16(4).
Neely M. P. “The Product approach to data quality and fitness for use: A Framework for analysis”, 2005, available at: https://repository.rit.edu/other/555
Ballou D.P, Chengalur–Smith I.N, Wang R.Y. “Sample–based quality estimation of query results in relational database envi-ronments” IEEE transactions on knowledge and data engineer-ing, 2006, no. 18(5).
Parssian A., Yeoh W., Ee M.S. “Quality–based SQL: specifying information quality in relational database queries” Computer, 2015, no. 48(9), pp.69–74.
Dukhovenskiy S. E., Nikulchev E.V. “Development of software and mathematical framework for estimation of distortions in SQL-query results under conditions of poor-quality data” IT-Standard, 2025, no. 3, pp. 58–73.
Kuzilek J., Hlosta M., Zdrahal Z. Open University Learning Analytics dataset, Sci. Data 4:170171, 2017, doi:10.1038/sdata.2017.171.
Oliveira P., Rodrigues F., Henriques P.R. “A formal definition of data quality problems” ICIQ, 2005.
Garcia-Molina H., Ullman J., Widom J. Relation algebra. Data-base systems: the complete book, Tr. from Eng., Moscow, Wil-liams Publishing House, 2003, pp. 203–249.
GOST R 71484.2-2024 (ISO/MEK 5259-2:2024) “Artificial intelligence. Data quality for analytics and machine learning. Part 2. Data quality measures” Electronic fund of legal and regulato-ry documents, available at: https://docs.cntd.ru/document/1310068315
Wang J., Liu Y., Li P., Lin Z., Sindakis S., Aggarwal S. “Over-view of data quality: Examining the dimensions, antecedents, and impacts of data quality” Journal of the Knowledge Econo-my, 2024, no. 15(1), pp. 1159–1178.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность Monetec 2026 СНЭ
ISSN: 2307-8162