The impact of polyhedral optimizations on data locality when parallelizing affine programs in the context of ATAX case study

Artem Lebedev

Abstract


The article examines the impact of polyhedral optimization on an important property of the program — locality of data use, which is closely related to the possible limitation of its performance both in sequential and parallel execution. A new method for finding multidimensional schedules of affine programs is proposed, combining topological sorting of the vertices of a generalized dependence graph and the greedy scheme developed by P. Feautrier (with modifications previously proposed by the author). Experimental results of studying the performance of parallel versions of the polybench/atax program obtained using the modern translator (pluto) and the translator developed by the author (ilpy) are presented, which allows to compare two approaches to setting optimization problems when finding affine mappings: lexicographic optimization (pluto) and integer linear programming (ilpy). Using the oprofile/ocount profiling tools, the inefficiency of using multi-core processor caches was revealed and the need to improve the criteria for the optimality of schedules and placements of computations for affine programs was shown.

 


Full Text:

PDF (Russian)

References


Jubb, C. (2014). Loop Optimizations in Modern C Compilers. Columbia University, New York, str, 15.

Wolf, M. (2014). High-performance embedded computing: applications in cyber-physical systems and mobile computing. Newnes.

Voevodin V.V., and Voevodin Vl.V. Parallel computations. BHV, Saint-Petersburg (2002). (In Russ.)

Feautrier, P., and Lengauer, C. (2011). Polyhedron Model. Encyclopedia of parallel computing, 1, 1581-1592.

Bondhugula, U., Baskaran, M., Krishnamoorthy, S., Ramanujam, J., Rountev, A., and Sadayappan, P. (2008). Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Compiler Construction: 17th International Conference, CC 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings 17 (pp. 132-146). Springer Berlin Heidelberg.

Grosser, T., Groesslinger, A., and Lengauer, C. (2012). Polly—performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters, 22(04), 1250010.

Sjödin, J., Pop, S., Jagasia, H., Grosser, T., and Pop, A. (2009, June). Design of graphite and the Polyhedral Compilation Package. In GCC Developers' Summit.

Bondhugula, U., Hartono, A., Ramanujam, J., and Sadayappan, P. (2008, June). A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 101-113).

Alfred Aho, V., Monica Lam, S., Ravi, S., and Jeffrey Ullman, D. Compilers Principles, Techniques, 2nd ed. Translation from English. ID Williams LLC, Moscow (2008). (In Russ.)

A. S. Lebedev. (2015). Space-time mappings for parallelization of affine programs. Informatsionnyye tekhnologii i vychislitel’nyye sistemy, 1, 19–32. (In Russ.)

A. S. Lebedev, and V. I. Solodovnikov. (2023). Translation of affine programs for parallel execution on universal multi-core processors. Proceedings of the Institute for Systems Analysis Russian Academy of Sciences, 73(4), 36-47. (In Russ.)

Feautrier, P. (1992). Some efficient solutions to the affine scheduling problem. I. One-dimensional time. International journal of parallel programming, 21, 313-347.

Feautrier, P. (1992). Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. International journal of parallel programming, 21, 389-420.

Pouchet, L. N., & Yuki, T. (2015). PolyBench/C 4.1. SourceForge. Available online: https://sourceforge.net/projects/polybench/.

Levon, J., & Elie, P. (2004). Oprofile: A system profiler for linux.

Feautrier, P. (1988). Parametric integer programming. RAIRO-Operations Research, 22(3), 243-268.

Griebl, M. (2004). Automatic parallelization of loop programs for distributed memory architectures. Passau, Germany: Univ. Passau.

A. I. Belousov, and S. B. Tkachev. Discrete mathematics: textbook, 5th ed. Moscow, Publishing house of Bauman Moscow State Technical University (2015). (In Russ.)

Intel Ivy Bridge Microarchitecture events. Available online: https://oprofile.sourceforge.io/docs/intel-ivybridge-events.php

Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4. Order Number: 325462-082US. December 2023. Available online: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

How to measure L1, L2 and LLC cache miss rate with ocount (OProfile). Yunming Zhang's Blog. Available online: https://yunmingzhang.wordpress.com/2015/07/09/how-to-measure-l1-l2-and-llc-cache-miss-rate-with-ocount-oprofile/


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность MoNeTec 2024

ISSN: 2307-8162