×

A parallel sparse direct solver via hierarchical DAG scheduling. (English) Zbl 1369.65046


MSC:

65F05 Direct numerical methods for linear systems and matrix inversion
65F50 Computational methods for sparse matrices
65Y05 Parallel numerical computation
65Y10 Numerical algorithms for specific classes of architectures

Software:

UHM; QUARK; MUMPS; Cilk; PARDISO; LAPACK
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Amestoy, P. R., Duff, I. S., L’Excellent, J. Y., and Koster, J. 2002. A fully asynchronous multifrontal solver using distributed dynamic scheduling.SIAM J. Matrix Anal. Appl. 23, 1, 15–41. · Zbl 0992.65018
[2] Amestoy, P. R., Guermouche, A., L’Excellent, J.-Y., and Pralet, S. 2006. Hybrid scheduling for the parallel solution of linear systems.Parall. Comput. 32, 2, 136–156.
[3] Ashcraft, C. and Grimes, R. 1989. The influence of relaxed supernode partitions on the multifrontal method.ACM Trans. Math. Softw. 15, 4, 291–309. · Zbl 0900.65061
[4] Babuška, I., Griebel, M., and Pitkäranta, J. 1989. The problem of selecting the shape functions for a p-type finite element.Int. J. Numer. Meth. Eng. 28, 8, 1891–1908. · Zbl 0705.73246
[5] Babuška, I. and Suri, M. 1994. The p and hp versions of the finite element method, basic principles and properties.SIAM Review 36, 4, 578–632. · Zbl 0813.65118
[6] Babuška, I., Szabó, B. A., and Katz, I. N. 1981. The p-version of the finite element method.SIAM J. Numer. Anal. 18, 3, 515–545.
[7] Bientinesi, P., Eijkhout, V., Kim, K., Kurtz, J., and van de Geijn, R. 2010. Sparse direct factorizations through unassembled hyper-matrices.Comput. Meth. Appl. Mech. Eng. 199, 430–438. · Zbl 1227.65111
[8] Blumofe, R. D., Joerg, C. F., Kuszmaul, B. C., Leiserson, C. E., Randall, K. H., and Zhou, Y. 1995. Cilk: An efficient multithreaded runtime system. InProceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Vol. 37, 207–216.
[9] Bosilca, G., Faverge, M., Lacoste, X., Yamazaki, I., and Ramet, P. 2012. Toward a supernodal sparse direct solver over DAG runtimes. InProceedings of PMAA’2012.
[10] Buttari, A. 2013. Fine-grained multithreading for the multifrontal QR factorization of sparse matrices.SIAM J. Sci. Comput. 35, 4, 323–345. · Zbl 1362.65031
[11] Buttari, A., Langou, J., Kurzak, J., and Dongarra, J. 2009. A class of parallel tiled linear algebra algorithms for multicore architectures.Parall. Comput. 35, 1, 38–53.
[12] Carnevali, P., Morris, R. B., Tsuji, Y., and Taylor, G. 1993. New basis functions and computational procedures for p-version finite element analysis.Int. J. Numer. Meth. Eng. 36, 22, 3759–3779. · Zbl 0795.73067
[13] Chan, E., van de Geijn, R. A., and Chapman, A. 2010. Managing the complexity of lookahead for LU factorization with pivoting. InProceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’10). ACM Press, New York, 200–208.
[14] Chan, E., van Zee, F. G., Quintana-Orti, E. S., Quintana-Orti, G., and van de Geijn, R. A. 2007. Satisfying your dependencies with Supermatrix. InProceedings of the IEEE International Conference on Cluster Computing. IEEE, 91–99.
[15] Damhaug, A. C. and Reid, J. K. 1996. MA46, a Fortran code for direct solution of sparse unsymmetric linear systems of equations from finite-element applications. Tech. Rep. RAL-TR-96-010, Computing and Information Systems Department, Rutherford Appleton Laboratory.
[16] Davis, T. A. 2006.Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2). Number 0898716136. SIAM, Philadelphia, PA. · Zbl 1119.65021
[17] Davis, T. A. and Hager, W. W. 2009. Dynamic supernodes in sparse Cholesky update / downdate and triangular solves.ACM Trans. Math. Softw. (TOMS) 35, 4, 1–23. · Zbl 05517420
[18] Demkowicz, L., Kurtz, J., Pardo, D., Paszynski, M., Rachowicz, W., and Zdunek, A. 2007.Computing with hp-Adaptive Finite Elements, Vol. 2: Frontiers Three Dimensional Elliptic and Maxwell Problems with Applications. Chapman & Hall CRC. · Zbl 1148.65001
[19] Demmel, J. W., Eisenstat, S. C., Gilbert, J. R., Li, X. S., and Liu, J. W. H. 1999. A supernodal approach to sparse partial pivoting.SIAM J. Matrix Anal. Appl. 20, 3, 720–755. · Zbl 0931.65022
[20] Dongarra, J., Faverge, M., Ltaief, H., and Luszczek, P. 2011. Achieving numerical accuracy and high performance using recursive tile LU factorization. Tech. Rep., LAPACK Working Note 259.
[21] Dongarra, J. J., Du Croz, J., Hammarling, S., and Duff, I. 1990. A set of level 3 basic linear algebra subprograms.ACM Trans. Math. Softw. 16, 1, 1–17. · Zbl 0900.65115
[22] Duff, I. S. 1986. Parallel implementation of multifrontal schemes.Parall. Comput. 3, 3, 193–204. · Zbl 0628.65018
[23] Duff, I. S., Erisman, A. M., and Reid, J. K. 1976. On George’s nested dissection method.SIAM Numer. Anal. 13, 5, 686–695. · Zbl 0345.65015
[24] Duff, I. S. and Reid, J. K. 1983. The multifrontal solution of indefinite sparse symmetric linear equations.ACM Trans. Math. Softw. (TOMS) 9, 3, 302–325. · Zbl 0515.65022
[25] Duran, A., Corbalán, J., and Ayguadé, E. 2008. Evaluation of OpenMP task scheduling strategies. InProceedings of the 4th International Conference on OpenMP in a New Era of Parallelism. Springer-Verlag, 100–110.
[26] Erisman, A. M. and Reid, J. K. 1974. Monitoring the stability of the triangular factorization of a sparse matrix.Numer. Math. 22, 3, 183–186. · Zbl 0271.65021
[27] Geist, G. A. and Ng, E. 1989. Task scheduling for parallel sparse Cholesky factorization.Int. J. Parall. Prog. 18, 4, 291–314. · Zbl 0702.68031
[28] Gilbert, J. R. and Tarjan, R. E. 1987. The analysis of a nested dissection algorithm.Numer. Math. 50, 4, 377–404. · Zbl 0645.65012
[29] Gould, N. I. M., Scott, J. A., and Hu, Y. 2007. A numerical evaluation of sparse direct solvers for the solution of large sparse symmetric linear systems of equations.ACM Trans. Math. Softw. 33, 2, 10:1–32. · Zbl 1365.65129
[30] Guermouche, A., L’Excellent, J.-Y., and Utard, G. 2003. Impact of reordering on the memory of a multifrontal solver.Parall. Comput. 29, 9, 1191–1218.
[31] Gupta, A. and Kumar, V. 1994. A scalable parallel algorithm for sparse Cholesky factorization. InProceedings of the ACM/IEEE Conference on Supercomputing. ACM, 793–802.
[32] Gustavson, F. G., Jonsson, I., Kågström, B., and Ling, P. 1999. Towards peak performance on hierarchical SMP memory architectures - new recursive blocked data formats and BLAS. InParallel Processing for Scientific Computing, 1–4.
[33] Hogg, J. D. 2008. A DAG-based parallel Cholesky factorization for multicore systems. Tech. Rep. RAL-TR-2008-029, Harwell Science and Innovation Campus, SFTC Rutherford Appleton Laboratory.
[34] Hogg, J. D., Reid, J. K., and Scott, J. A. 2010. Design of a multicore sparse Cholesky factorization using DAGs.SIAM J. Sci. Comput. 32, 6, 3627–3649. · Zbl 1221.65088
[35] Irony, D., Shklarski, G., and Toledo, S. 2004. Parallel and fully recursive multifrontal sparse Cholesky.Future Gen. Comput. Syst. 20, 3, 425–440. · Zbl 1062.65034
[36] Lacoste, X., Ramet, P., Faverge, M., Yamazaki, I., and Dongarra, J. 2012. Sparse direct solvers with accelerators over DAG runtimes. Tech. rep.
[37] Liu, J. W. H. 1992. The multifrontal method for sparse matrix solution: theory and practice.SIAM Rev. 34, 1, 82–109. · Zbl 0919.65019
[38] Liu, J. W. H., Ng, E. G., and Peyton, B. W. 1993. On finding supernodes for sparse matrix computations.SIAM J. Matrix Anal. Appl. 14, 1, 242–252. · Zbl 0765.65034
[39] Low, T. M. and van de Geijn, R. A. 2004. An API for manipulating matrices stored by blocks. Tech. Rep., FLAME Working Note 12, TR-2004-15, The University of Texas at Austin.
[40] McCalpin, J. D. 1995. Memory bandwidth and machine balance in current high performance computers.IEEE Comput. Soc. Tech. Comm. Comput. Arch. (TCCA) Newsletter, 19–25.
[41] Olivier, S. L. and Prins, J. F. 2010. Comparison of OpenMP 3.0 and other task parallel frameworks on unbalanced task graphs.Int. J. Parall. Prog. 38, 5–6, 341–360. · Zbl 1211.68076
[42] OpenMP Architecture Review Board. 2008.OpenMP application program interface, Version 3.0. http://www.openmp.org.
[43] Pothen, A. and Sun, C. 1993. A mapping algorithm for parallel sparse Cholesky factorization.SIAM J. Sci. Comput. 14, 5, 1253–1257. · Zbl 0785.65016
[44] Quintana-Ortí, E. S. and van de Geijn, R. A. 2008. Updating an LU factorization with pivoting.ACM Trans. Math. Softw. 35, 2, 1–16.
[45] Quintana-Ortí, G., Quintana-Ortí, E. S., van de Geijn, R. A., van Zee, F. G., and Chan, E. 2009. Programming matrix algorithms-by-blocks for thread-level Parallelism.ACM Trans. Math. Softw. 36, 3, 1–26. · Zbl 1364.65105
[46] Schenk, O. and Gärtner, K. 2002. Solving unsymmetric sparse systems of linear equations with PARDISO. InProceedings of the International Conference on Computational Science-Part II. Springer-Verlag, 335–363. · Zbl 1062.65035
[47] Schenk, O. and Gärtner, K. 2006. On fast factorization pivoting methods for sparse symmetric indefinite systems.Electron. Trans. Numer. Anal. 23, 158–179. · Zbl 1112.65022
[48] Shah, S., Haab, G., Petersen, P., and Throop, J. 2000. Flexible control structures for parallelism in OpenMP.Concurrency: Pract. Exp. 12, 12, 1219–1239. · Zbl 1008.68558
[49] Szabó, B. A. 1990. The p and hp versions of the finite element method in solid mechanics.Comput. Meth. Appl. Mech. Eng. 80, 1–3, 185–195.
[50] Terboven, C., an Mey, D., Schmidl, D., Jin, H., and Reichstein, T. 2008. Data and thread affinity in OpenMP programs. InProceedings of the Workshop on Memory Access on Future Processors. 377–384.
[51] Yarkhan, A., Kurzak, J., and Dongarra, J. 2011. QUARK Users’ Guide. Tech. Rep., Electrical Engineering and Computer Science, Innovative Computing Laboratory, University of Tenessee.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.