×

Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects. (English) Zbl 1418.65044

Summary: We first briefly report on the status and recent achievements of the ELPA-AEO (Eigen value Solvers for Petaflop Applications-Algorithmic Extensions and Optimizations) and ESSEX II (Equipping Sparse Solvers for Exascale) projects. In both collaboratory efforts, scientists from the application areas, mathematicians, and computer scientists work together to develop and make available efficient highly parallel methods for the solution of eigenvalue problems. Then we focus on a topic addressed in both projects, the use of mixed precision computations to enhance efficiency. We give a more detailed description of our approaches for benefiting from either lower or higher precision in three selected contexts and of the results thus obtained.

MSC:

65F15 Numerical computation of eigenvalues and eigenvectors of matrices
65F25 Orthogonalization in numerical linear algebra
65Y05 Parallel numerical computation
65Y99 Computer aspects of numerical algorithms
PDFBibTeX XMLCite
Full Text: DOI arXiv Link

References:

[1] Alvermann, A., Basermann, A., Fehske, H., Galgon, M., Hager, G., Kreutzer, M., Krämer, L., Lang, B., Pieper, A., Röhrig-Zöllner, M., Shahzad, F., Thies, J., Wellein, G.: ESSEX: Equipping sparse solvers for exascale. In: Lopes, L., et al. (eds.) Euro-Par 2014: Parallel Processing Workshops, LNCS, Springer, vol. 8806, pp. 577-588 (2014)
[2] Auckenthaler, T., Blum, V., Bungartz, H.J., Huckle, T., Johanni, R., Krämer, L., Lang, B., Lederer, H., Willems, P.R.: Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Comput. 37(12), 783-794 (2011) · doi:10.1016/j.parco.2011.05.002
[3] Baboulin, M., Buttari, A., Dongarra, J., Kurzak, J., Langou, J., Langou, J., Luszczek, P., Tomov, S.: Accelerating scientific computations with mixed precision algorithms. Comput. Phys. Comm. 180(12), 2526-2533 (2009) · Zbl 1197.65240 · doi:10.1016/j.cpc.2008.11.005
[4] Blum, V., Gehrke, R., Hanke, F., Havu, P., Havu, V., Ren, X., Reuter, K., Scheffler, M.: Ab initio molecular simulations with numeric atom-centered orbitals. Comput. Phys. Comm. 180, 2175-2196 (2009) · Zbl 1197.81005 · doi:10.1016/j.cpc.2009.06.022
[5] Cannon, L.E.: A cellular computer to implement the Kalman filter algorithm. Ph.D. thesis, Montana State University, Bozeman, MT (1969)
[6] Carbogno, C., Levi, C.G., Van de Walle, C.G., Scheffler, M.: Ferroelastic switching of doped zirconia: modeling and understanding from first principles. Phys. Rev. B 90, 144109 (2014) · doi:10.1103/PhysRevB.90.144109
[7] Carbogno, C., Ramprasad, R., Scheffler, M.: Ab Initio Green-Kubo approach for the thermal conductivity of solids. Phys. Rev. Lett. 118(17), 175901 (2017) · doi:10.1103/PhysRevLett.118.175901
[8] Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. 34(1), A206-A239 (2012) · Zbl 1241.65028 · doi:10.1137/080731992
[9] Galgon, M., Krämer, L., Lang, B.: Improving projection-based eigensolvers via adaptive techniques. Numer. Linear Algebra Appl. 25(1), e2124 (2017) · Zbl 1499.65126 · doi:10.1002/nla.2124
[10] Gavin, B., Polizzi, E.: Krylov eigenvalue strategy using the FEAST algorithm with inexact system solves. Numer. Linear Algebra Appl. p. e2188 (2018) · Zbl 1513.65096
[11] Havu, V., Blum, V., Havu, P., Scheffler, M.: Efficient \[O(N)O(N)\] integration for all-electron electronic structure calculation using numeric basis functions. J. Comput. Phys. 228(22), 8367-8379 (2009) · Zbl 1180.82004 · doi:10.1016/j.jcp.2009.08.008
[12] Hoemmen, M.: Communication-avoiding Krylov subspace methods. Ph.D. thesis, University of California, Berkeley (2010)
[13] Kreutzer, M., Hager, G., Wellein, G., Fehske, H., Bishop, A.R.: A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM J. Sci. Comput. 36(5), C401-C423 (2014) · Zbl 1307.65055 · doi:10.1137/130930352
[14] Kreutzer, M.; Thies, J.; Pieper, A.; Alvermann, A.; Galgon, M.; Röhrig-Zöllner, M.; Shahzad, F.; Basermann, A.; Bishop, AR; Fehske, H.; Hager, G.; Lang, B.; Wellein, G.; Bungartz, HJ (ed.); Neumann, P. (ed.); Nagel, WE (ed.), Performance engineering and energy efficiency of building blocks for large, sparse eigenvalue computations on heterogeneous supercomputers, No. 113, 317-338 (2016), Switzerland
[15] Kreutzer, M., Thies, J., Röhrig-Zöllner, M., Pieper, A., Shahzad, F., Galgon, M., Basermann, A., Fehske, H., Hager, G., Wellein, G.: GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems. Int. J. Parallel Prog. 45(5), 1046-1072 (2016) · doi:10.1007/s10766-016-0464-z
[16] Kühne, T.D., Krack, M., Mohamed, F.R., Parrinello, M.: Efficient and accurate Car-Parrinello-like approach to Born-Oppenheimer molecular dynamics. Phys. Rev. Lett. 98(6), 066401 (2007) · doi:10.1103/PhysRevLett.98.066401
[17] Lang, B.: Efficient reduction of banded hermitian positive definite generalized eigenvalue problems to banded standard eigenvalue problems. SIAM J. Sci. Comput. 41(1), C52-C72 (2019) · Zbl 1455.65055 · doi:10.1137/18M1167322
[18] Manin, V., Lang, B.: Cannon-type triangular matrix multiplication for the reduction of generalized hpd eigenproblems to standard form (2018) (Submitted)
[19] Marek, A., Blum, V., Johanni, R., Havu, V., Lang, B., Auckenthaler, T., Heinecke, A., Bungartz, H.J., Lederer, H.: The ELPA library: Scalable parallel eigenvalue solutions for electronic structure theory and computational science. J. Phys.: Condens. Matter 26(21), 213201 (2014)
[20] Muller, J.M., Brisebarre, N., de Dinechin, F., Jeannerod, C.P., Lefèvre, V., Melquiond, G., Revol, N., Stehlé, D., Torres, S.: Handbook of Floating-Point Arithmetic. Springer, Berlin (2010) · Zbl 1197.65001 · doi:10.1007/978-0-8176-4705-6
[21] Nemec, L., Blum, V., Rinke, P., Scheffler, M.: Thermodynamic equilibrium conditions of graphene films on SiC. Phys. Rev. Lett. 111(6), 065502 (2013) · doi:10.1103/PhysRevLett.111.065502
[22] Pieper, A., Kreutzer, M., Alvermann, A., Galgon, M., Fehske, H., Hager, G., Lang, B., Wellein, G.: High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations. J. Comput. Phys. 325, 226-243 (2016) · Zbl 1376.65055 · doi:10.1016/j.jcp.2016.08.027
[23] Polizzi, E.: Density-matrix-based algorithm for solving eigenvalue problems. Phys. Rev. B 79(11), 115112 (2009) · doi:10.1103/PhysRevB.79.115112
[24] Röhrig-Zöllner, M., Thies, J., Kreutzer, M., Alvermann, A., Pieper, A., Basermann, A., Hager, G., Wellein, G., Fehske, H.: Increasing the performance of the Jacobi-Davidson method by blocking. SIAM J. Sci. Comput. 37(6), C697-C722 (2015) · Zbl 1329.65077 · doi:10.1137/140976017
[25] Rouet, F.H., Li, X.S., Ghysels, P., Napov, A.: A distributed-memory package for dense hierarchically semi-separable matrix computations using randomization. ACM Trans. Math. Softw. 42(4), 27:1-27:35 (2016) · Zbl 1369.65043
[26] Saad, Y.: Numerical Methods for Large Eigenvalue Problems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2011) · Zbl 1242.65068 · doi:10.1137/1.9781611970739
[27] Sakurai, T., Sugiura, H.: A projection method for generalized eigenvalue problems using numerical integration. J. Comput. Appl. Math. 159(1), 119-128 (2003) · Zbl 1037.65040 · doi:10.1016/S0377-0427(03)00565-X
[28] Sakurai, T., Tadano, H.: CIRR: a Rayleigh-Ritz type method with contour integral for generalized eigenvalue problems. Hokkaido Math. J. 36, 745-757 (2007) · Zbl 1156.65035 · doi:10.14492/hokmj/1272848031
[29] Schönemann, P.H.: A generalized solution of the orthogonal Procrustes problem. Psychometrika 31(1), 1-10 (1966) · Zbl 0147.19401 · doi:10.1007/BF02289451
[30] Shahzad, F., Thies, J., Kreutzer, M., Zeiser, T., Hager, G., Wellein, G.: CRAFT: A library for easier application-level checkpoint/restart and automatic fault tolerance (2017). Preprint: arXiv:1708.02030(Submitted)
[31] Song, W., Wubs, F., Thies, J., Baars, S.: Numerical bifurcation analysis of a 3D turing-type reaction-diffusion model. Commun. Nonlinear Sci. Numer. Simul. 60, 145-164 (2018) · Zbl 1470.65183 · doi:10.1016/j.cnsns.2018.01.003
[32] Stathopoulos, A., Wu, K.: A block orthogonalization procedure with constant synchronization requirements. SIAM J. Sci. Comput. 23(6), 2165-2182 (2002) · Zbl 1018.65050 · doi:10.1137/S1064827500370883
[33] Stewart, G.W.: Block Gram-Schmidt orthogonalization. SIAM J. Sci. Comput. 31(1), 761-775 (2008) · Zbl 1185.65069 · doi:10.1137/070682563
[34] Thies, J.; Galgon, M.; Shahzad, F.; Alvermann, A.; Kreutzer, M.; Pieper, A.; Röhrig-Zöllner, M.; Basermann, A.; Fehske, H.; Hager, G.; Lang, B.; Wellein, G.; Bungartz, HJ (ed.); Neumann, P. (ed.); Nagel, WE (ed.), Towards an exascale enabled sparse solver repository, No. 113, 295-316 (2016), Switzerland
[35] Yamamoto, Y., Nakatsukasa, Y., Yanagisawa, Y., Fukaya, T.: Roundoff error analysis of the Cholesky QR2 algorithm. Electron. Trans. Numer. Anal. 44, 306-326 (2015) · Zbl 1330.65049
[36] Yamazaki, I., Tomov, S., Dong, T., Dongarra, J.: Mixed-precision orthogonalization scheme and adaptive step size for improving the stability and performance of CA-GMRES on GPUs. In: Daydé, M.J., Marques, O., Nakajima, K. (eds.) High Performance Computing for Computational Science—VECPAR 2014—11th International Conference, Eugene, OR, USA, June 30-July 3, 2014, Revised Selected Papers, Lecture Notes in Computer Science, vol. 8969, pp. 17-30. Springer (2014) · Zbl 07631077
[37] Yamazaki, I., Tomov, S., Dongarra, J.: Mixed-precision Cholesky QR factorization and its case studies on multicore CPU with multiple GPUs. SIAM J. Sci. Comput. 37(3), C307-C330 (2015) · Zbl 1320.65046 · doi:10.1137/14M0973773
[38] Yu, V.W., Corsetti, F., García, A., Huhn, W.P., Jacquelin, M., Jia, W., Lange, B., Lin, L., Lu, J., Mi, W., Seifitokaldani, A., Vázquez-Mayagoitia, Á., Yang, C., Yang, H., Blum, V.: ELSI: A unified software interface for Kohn-Sham electronic structure solvers. Comput. Phys. Comm. 222, 267-285 (2018) · Zbl 07693050 · doi:10.1016/j.cpc.2017.09.007
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.