×

TSFC: a structure-preserving form compiler. (English) Zbl 1388.68020

Summary: A form compiler takes a high-level description of the weak form of partial differential equations and produces low-level code that carries out the finite element assembly. In this paper we present the Two-Stage Form Compiler (TSFC), a new form compiler with the main motivation being to maintain the structure of the input expression as long as possible. This facilitates the application of optimizations at the highest possible level of abstraction. TSFC features a novel, structure-preserving method for separating the contributions of a form to the subblocks of the local tensor in discontinuous Galerkin problems. This enables us to preserve the tensor structure of expressions longer through the compilation process than is possible with other form compilers. This is also achieved in part by a two-stage approach that cleanly separates the lowering of finite element constructs to tensor algebra in the first stage, from the scheduling of those tensor operations in the second stage. TSFC also efficiently traverses complicated expressions, and experimental evaluation demonstrates good compile-time performance even for highly complex forms.

MSC:

68N20 Theory of compilers and interpreters
65M60 Finite element, Rayleigh-Ritz and Galerkin methods for initial value and initial-boundary value problems involving PDEs
65N30 Finite element, Rayleigh-Ritz and Galerkin methods for boundary value problems involving PDEs
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, TensorFlow: A system for large-scale machine learning, in Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, Berkeley, CA, USENIX Association, 2016, pp. 265–283, , .
[2] A. Allam, J. Ramanujam, G. Baumgartner, and P. Sadayappan, Memory minimization for tensor contractions using integer linear programming, in Proceedings of the 20th IEEE International Parallel Distributed Processing Symposium, 2006, .
[3] M. S. Aln\aes, J. Blechta, J. Hake, A. Johansson, B. Kehlet, A. Logg, C. Richardson, J. Ring, M. E. Rognes, and G. N. Wells, The FEniCS Project Version 1.5, Arch. Numer. Software, 3 (2015), pp. 9–23, .
[4] M. S. Aln\aes, A. Logg, K. B. Ø lgaard, M. E. Rognes, and G. N. Wells, Unified Form Language: A domain-specific language for weak formulations of partial differential equations, ACM Trans. Math. Software, 40 (2014), 9, , . · Zbl 1308.65175
[5] M. S. Aln\aes and K.-A. Mardal, On the efficiency of symbolic computations combined with code generation for finite element methods, ACM Trans. Math. Software, 37 (2010), 6, . · Zbl 1364.68375
[6] B. Bagheri and R. Scott, Analysa, (accessed 2016-08-05).
[7] G. Balaban, M. S. Aln\aes, J. Sundnes, and M. E. Rognes, Adjoint multi-start-based estimation of cardiac hyperelastic material parameters using shear data, Biomech. Model. Mechanobiol., 15 (2016), pp. 1509–1521, , .
[8] G. Baumgartner, A. A. Auer, D. E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R. J. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, C.-C. Lam, Q. Lu, M. Nooijen, R. M. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov, Synthesis of high-performance parallel programs for a class of Ab Initio quantum chemistry models, Proc. IEEE, 93 (2005), pp. 276–292, .
[9] C. Chiw, G. Kindlmann, and J. Reppy, EIN: An intermediate representation for compiling tensor calculus, in Compilers for Parallel Computing, CPC 2016, 2016, .
[10] C. Chiw, G. Kindlmann, J. Reppy, L. Samuels, and N. Seltzer, Diderot: A parallel DSL for image analysis and visualization, in Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, ACM, New York, 2012, pp. 111–120, .
[11] D. Cociorva, G. Baumgartner, C.-C. Lam, P. Sadayappan, J. Ramanujam, M. Nooijen, D. E. Bernholdt, and R. Harrison, Space-time trade-off optimization for a class of electronic structure calculations, in Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, ACM, New York, 2002, pp. 177–186, . · Zbl 1052.68545
[12] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2nd ed., The MIT Press, Cambridge, MA, 2001. · Zbl 1047.68161
[13] P. Dular, C. Geuzaine, F. Henrotte, and W. Legros, A general environment for the treatment of discrete problems and its application to the finite element method, IEEE Trans. Magnetics, 34 (1998), pp. 3395–3398, .
[14] E. Epifanovsky, M. Wormit, T. Kuś, A. Landau, D. Zuev, K. Khistyaev, P. Manohar, I. Kaliman, A. Dreuw, and A. I. Krylov, New implementation of high-level correlated methods using a general block tensor library for high-performance electronic structure calculations, J. Comput. Chem., 34 (2013), pp. 2293–2309, .
[15] F. Franchetti, Y. Voronenko, and M. Püschel, Formal loop merging for signal transforms, in Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’05, ACM, New York, 2005, pp. 315–326, .
[16] G. Guennebaud, B. Jacob, et al., Eigen v3, 2010, (accessed 2017-12-14).
[17] A. Hartono, Q. Lu, X. Gao, S. Krishnamoorthy, M. Nooijen, G. Baumgartner, D. E. Bernholdt, V. Choppella, R. M. Pitzer, J. Ramanujam, A. Rountev, and P. Sadayappan, Identifying cost-effective common subexpressions to reduce operation count in tensor contraction evaluations, in Computational Science – ICCS 2006, Lecture Notes in Comput. Sci. 3991, V. N. Alexandrov, G. D. van Albada, P. M. A. Sloot, and J. Dongarra, eds., Springer, New York, 2006, pp. 267–275, .
[18] A. Hartono, A. Sibiryakov, M. Nooijen, G. Baumgartner, D. E. Bernholdt, S. Hirata, C.-C. Lam, R. M. Pitzer, J. Ramanujam, and P. Sadayappan, Automated operation minimization of tensor contraction expressions in electronic structure calculations, in Computational Science – ICCS 2005, Lecture Notes in Comput. Sci. 3514, V. S. Sunderam, G. D. van Albada, P. M. A. Sloot, and J. J. Dongarra, eds., Springer, New York, 2005, pp. 155–164, . · Zbl 1129.68590
[19] F. Hecht, New development in freefem++, J. Numer. Math., 20 (2012), pp. 251–266, . · Zbl 1266.68090
[20] A. Heinecke, G. Henry, M. Hutchinson, and H. Pabst, Libxsmm: Accelerating small matrix multiplications by runtime code generation, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE Press, Piscataway, NJ, 2016, pp. 981–991, .
[21] M. Homolya, Experimentation framework for manuscript “TSFC: A structure-preserving form compiler,” version 1, May 2017, .
[22] M. Homolya, R. C. Kirby, and D. A. Ham, Exposing and Exploiting Structure: Optimal Code Generation for High-Order Finite Element Methods, preprint, , 2017; submitted to ACM Trans. Math. Software.
[23] A. B. Kahn, Topological sorting of large networks, Comm. ACM, 5 (1962), pp. 558–562, . · Zbl 0106.32602
[24] G. Kindlmann, C. Chiw, N. Seltzer, L. Samuels, and J. Reppy, Diderot: A domain-specific language for portable parallel scientific visualization and image analysis, IEEE Trans. Vis. Comput. Graphics, 22 (2016), pp. 867–876, .
[25] R. C. Kirby, Algorithm 839: FIAT, a new paradigm for computing finite element basis functions, ACM Trans. Math. Software, 30 (2004), pp. 502–516, . · Zbl 1070.65571
[26] R. C. Kirby, A General Approach to Transforming Finite Elements, preprint, , 2017.
[27] R. C. Kirby and A. Logg, A compiler for variational forms, ACM Trans. Math. Software, 32 (2006), pp. 417–444, , .
[28] R. C. Kirby and A. Logg, Efficient compilation of a class of variational forms, ACM Trans. Math. Software, 33 (2007), , . · Zbl 1365.65257
[29] F. Kjolstad, S. Kamil, S. Chou, D. Lugato, and S. Amarasinghe, The tensor algebra compiler, Proc. ACM Programming Languages, 1 (2017), 77, .
[30] D. E. Knuth, The Art of Computer Programming. Vol. 1: Fundamental Algorithms, Addison-Wesley, Reading, MA, 1968. · Zbl 0191.17903
[31] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, Factor graphs and the sum-product algorithm, IEEE Trans. Inform. Theory, 47 (2001), pp. 498–519, . · Zbl 0998.68234
[32] C.-C. Lam, D. Cociorva, G. Baumgartner, and P. Sadayappan, Memory-optimal evaluation of expression trees involving large objects, in High Performance Computing – HiPC’99, Lecture Notes in Comput. Sci. 1745, P. Banerjee, V. K. Prasanna, and B. P. Sinha, eds., Springer, New York, 1999, pp. 103–110, . · Zbl 1218.68072
[33] C.-C. Lam, D. Cociorva, G. Baumgartner, and P. Sadayappan, Optimization of memory usage and communication requirements for a class of loops implementing multi-dimensional integrals, in Languages and Compilers for Parallel Computing, Lecture Notes in Comput. Sci. 1863, L. Carter and J. Ferrante, eds., Springer, New York, 1999, pp. 350–364, .
[34] C.-C. Lam, T. Rauber, G. Baumgartner, D. Cociorva, and P. Sadayappan, Memory-optimal evaluation of expression trees involving large objects, Comput. Languages Syst. Structures, 37 (2011), pp. 63–75, . · Zbl 1218.68072
[35] C.-C. Lam, P. Sadayappan, and R. Wenger, Optimal reordering and mapping of a class of nested-loops for parallel execution, in 9th International Workshop on Languages and Compilers for Parallel Computing, Springer, Berlin, Heidelberg, 1996, pp. 315–329, .
[36] C.-C. Lam, P. Sadayappan, and R. Wenger, On optimizing a class of multi-dimensional loops with reduction for parallel execution, Parallel Process. Lett., 07 (1997), pp. 157–168, .
[37] A. Logg, K.-A. Mardal, and G. Wells, eds., Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book, Lect. Notes Comput. Sci. Eng. 84, Springer, New York, 2012, . · Zbl 1247.65105
[38] A. Logg, K. B. Ø lgaard, M. E. Rognes, and G. N. Wells, FFC: The FEniCS form compiler, in Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book, A. Logg, K.-A. Mardal, and G. Wells, eds., Lect. Notes Comput. Sci. Eng. 84, Springer, New York, 2012, pp. 227–238, .
[39] K. Long, R. Kirby, and B. van Bloemen Waanders, Unified embedded parallel finite element computations via software-based Fréchet differentiation, SIAM J. Sci. Comput., 32 (2010), pp. 3323–3351, . · Zbl 1221.65306
[40] F. Luporini, D. A. Ham, and P. H. J. Kelly, An algorithm for the optimization of finite element integration loops, ACM Trans. Math. Software, 44 (2017), 3, , . · Zbl 1380.65381
[41] F. Luporini, A. L. Varbanescu, F. Rathgeber, G.-T. Bercea, J. Ramanujam, D. A. Ham, and P. H. J. Kelly, Cross-loop optimization of arithmetic intensity for finite element local assembly, ACM Trans. Architect. Code Optim., 11 (2015), 57, , .
[42] S. Müthing, M. Piatkowski, and P. Bastian, High-Performance Implementation of Matrix-Free High-Order Discontinuous Galerkin Methods, preprint, , 2017.
[43] T. Nelson, A. Rivera, P. Balaprakash, M. Hall, P. D. Hovland, E. Jessup, and B. Norris, Generating efficient tensor contractions for GPUs, in 44th International Conference on Parallel Processing (ICPC), 2015, pp. 969–978, .
[44] K. B. Ølgaard, A. Logg, and G. N. Wells, Automated code generation for discontinuous Galerkin methods, SIAM J. Sci. Comput., 31 (2008), pp. 849–864, . · Zbl 1189.65283
[45] K. B. Ølgaard and G. N. Wells, Optimisations for quadrature representations of finite element tensors through automated code generation, ACM Trans. Math. Software, 37 (2010), 8, , . · Zbl 1364.65063
[46] M. Puschel, J. M. F. Moura, J. R. Johnson, D. Padua, M. M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, SPIRAL: Code generation for DSP transforms, Proc. IEEE, 93 (2005), pp. 232–275, .
[47] F. Rathgeber, D. A. Ham, L. Mitchell, M. Lange, F. Luporini, A. T. T. McRae, G.-T. Bercea, G. R. Markall, and P. H. J. Kelly, Firedrake: Automating the finite element method by composing abstractions, ACM Trans. Math. Software, 43 (2016), 24, , . · Zbl 1396.65144
[48] B. A. Sanders, R. Bartlett, E. Deumens, V. Lotrich, and M. Ponton, A block-oriented language and runtime system for tensor algebra with very large arrays, in 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010, pp. 1–11, .
[49] R. Sethi and J. D. Ullman, The generation of optimal code for arithmetic expressions, J. Assoc. Comput. Mach., 17 (1970), pp. 715–728, . · Zbl 0212.18802
[50] E. Solomonik, D. Matthews, J. Hammond, and J. Demmel, Cyclops Tensor Framework: Reducing communication and eliminating load imbalance in massively parallel contractions, in 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), 2013, pp. 813–824, .
[51] P. Springer and P. Bientinesi, Design of a high-performance GEMM-like tensor–tensor multiplication, ACM Trans. Math. Softw., 44 (2018), 28, . · Zbl 1484.65092
[52] G. Truter, Structure-Preserving Automatic Differentiation and Pull-backs in a Language for Variational Forms, master’s thesis, Imperial College London, 2017.
[53] J. Xiong, J. Johnson, R. Johnson, and D. Padua, SPL: A language and compiler for DSP algorithms, in Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, PLDI ’01, ACM, New York, 2001, pp. 298–308, .
[54] Zenodo/COFFEE, COFFEE: A COmpiler for Fast Expression Evaluation, Feb. 2016, .
[55] Zenodo/COFFEE, COFFEE: A COmpiler for Fast Expression Evaluation, May 2017, .
[56] Zenodo/dijitso, A Python Module for Distributed Just-in-Time Shared Library Building, May 2017, .
[57] Zenodo/FFC, FFC: The FEniCS Form Compiler, Apr. 2015, .
[58] Zenodo/FFC, FFC: The FEniCS Form Compiler, May 2017, .
[59] Zenodo/FIAT, FIAT: FInite Element Automated Tabulator, Feb. 2016, .
[60] Zenodo/FIAT, FIAT: FInite Element Automated Tabulator, May 2017, .
[61] Zenodo/FInAT, FInAT: A Smarter Library of Finite Elements, May 2017, .
[62] Zenodo/Instant, Instant Is a Python Module that Allows for Instant Inlining of C and C++ Code in Python, May 2017, .
[63] Zenodo/TSFC, TSFC: The Two-Stage Form Compiler, May 2017, .
[64] Zenodo/UFL, UFL: Unified Form Language, Apr. 2015, .
[65] Zenodo/UFL, UFL: Unified Form Language, May 2017, .
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.