×

Beyond unimodular transformations. (English) Zbl 0840.68009

Summary: This paper presents an approach to modeling loop transformations using linear algebra. Compound transformations are modeled as integer matrices. The nonsingular linear transformations presented here subsume the class of unimodular transformations. The loop transformations included are the unimodular transformations – reversal, skewing, and permutation – and a new transformation, namely stretching. Nonunimodular transformations (with determinant \(\geq 1\)) create “holes” in the transformed iteration space, rendering code generation difficult. We solve this problem by suitably changing the step size of loops in order to “skip” these holes when traversing the transformed iteration space. For the class of nonunimodular loop transformations, we present algorithms for deriving the loop bounds, the array access expressions, and the step sizes of loops in the nest.
To derive the step sizes, we compute the Hermite normal form of the transformation matrix; the step sizes are the entries on the diagonal of this matrix. We then use the theory of Hessenberg matrices in the derivation of exact loop bounds for nonunimodular transformations. We illustrate the use of this approach in several problems such as the generation of tile sets and distributed-memory code generation. This approach provides a framework for optimizing programs for a variety of architectures.

MSC:

68M07 Mathematical problems of computer architecture
15A60 Norms of matrices, numerical range, applications of functional analysis to matrix theory
15A04 Linear transformations, semilinear transformations
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Allen, J.R., and Kennedy, K. 1987. Automatic translation of FORTRAN programs to vector form.ACM Trans. Programming Languages and Systems, 9, 4 (Oct.): 491-542. · Zbl 0631.68019 · doi:10.1145/29873.29875
[2] Ancourt, C., and Irigoin, F. 1991. Scanning polyhedra with DO loops. InProc., Third ACM SIGPLAN Symp. on the Principles & Practice of Parallel Programming (PPoPP), pp. 39-50.
[3] Banerjee, U. 1988.Dependence Analysis for Supercomputing. Kluwer Academic, Boston.
[4] Banerjee, U. 1991. Inimodular transformations of double loops. InAdvances in Languages and Compilers for Parallel Processing (A. Nicolau et al., eds.), MIT Press, pp. 192-219.
[5] Barnett, M., and Lengauer, C. 1992. Loop parallelization and unimodularity. Rept. ECS-LFCS-92-197, Univ. of Edinburgh.
[6] Dowling, M. 1988. Optimal code parallelisation using unimodular transformations. InPreprints in Optimization, Carolo-Wilhelmina Universität zu Braunschweig, Germany.
[7] Heller, D. 1974. A determinant theorem with applications to parallel algorithms.SIAM J. Num. Anal., 11: 484-496. · Zbl 0286.65019 · doi:10.1137/0711048
[8] Hiranandani, S., Kennedy, K., and Tseng, C. 1991. Compiler optimization for Fortran D on MIMD distributed memory machines. InProc. Supercomputing ’91. pp. 86-100.
[9] Irigoin, F., and Triolet, R. 1988. Supernode partitioning. InProc., 15th Annual ACM Symp. on the Principles of Programming Languages (San Diego, Jan.), pp. 319-329.
[10] Li, W., and Pingali, K. 1992. A singular loop transformation framework based on non-singular matrices. InProc., 5th Workshop on Languages and Compilers for Parallel Computing. · Zbl 0806.68024
[11] Lu, L. 1991. A unified framework for systematic loop transformations. InProc., Third ACM SIGPLAN Symp. on the Principles & Practice of Parallel Programming (PPoPP), pp. 28-38.
[12] Nemhauser, G., and Wolsey, L. 1988.Integer and Combinatorial Optimization. Wiley, New York. · Zbl 0652.90067
[13] Ramanujam, J. 1990. Compile-time techniques for parallel execution of loops on distributed memory multiprocessors. Ph.D. thesis, Ohio State Univ., Columbus, Oh.
[14] Ramanujam, J. 1992. A linear algebraic view of loop transformations and their interaction. InProc., Fifth SIAM Conf. on Parallel Processing for Scientific Computing, pp. 543-548. · Zbl 0800.68313
[15] Ramanujam, J. 1994. Efficient code generation for loop transformations. Tech. Rept. TR-94-08-03, Dept. of Electr. and Comp. Eng., La. State Univ., Baton Rouge, La.
[16] Ramanujam, J., and Sadayappan, P. 1992. Tiling multidimensional iteration spaces for nonshared memory machines.J. Parallel and Distr. Comp., 16, 2 (Oct.): 108-120. · doi:10.1016/0743-7315(92)90027-K
[17] Schreiber, R., and Dongarra, J. 1990. Automatic blocking of nested loops. Tech. rept., Univ. of Tenm., Knoxville (Aug.).
[18] Schrijver, A. 1986.Theory of Linear and Integer Programming. Wiley, New York. · Zbl 0665.90063
[19] Wolf, M., and Lam, M. 1991. A loop transformation theory and an algorithm to maximize parallelism.IEEE Trans. Parallel and Distr. Syst., 2, 4 (Oct.): 452-471. · Zbl 05106980 · doi:10.1109/71.97902
[20] Wolfe, M. 1989a. More iteration space tiling. InProc., Supercomputing ’89, pp. 655-664.
[21] Wolfe, M. 1989b.Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge, Mass. · Zbl 0699.68007
[22] Wolfe, M. 1990. Massive parallelism through program restructuring. Tech. Rept. CS/E90-009, Oregon Graduate Institute (June).
[23] Wolfe, M., and Tseng, C. 1992. The power test for data dependence.IEEE Trans. Parallel and Distr. Syst., 3, 5 (Sept.): 591-601. · Zbl 05107117 · doi:10.1109/71.159042
[24] Zima, H., and Chapman, B. 1990.Supercompilers for Parallel and Vector Supercomputers, ACM Press Frontier Series.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.