×

Performance evaluation of mixed-mode OpenMP/MPI implementations. (English) Zbl 1213.68143

Summary: With the current prevalence of multi-core processors in HPC architectures mixed-mode programming, using both MPI and OpenMP in the same application, is seen as an important technique for achieving high levels of scalability. As there are few standard benchmarks written in this paradigm, it is difficult to assess the likely performance of such programs. To help address this, we examine the performance of mixed-mode OpenMP/MPI on a number of popular HPC architectures, using a synthetic benchmark suite and two large-scale applications. We find performance characteristics which differ significantly between implementations, and which highlight possible areas for improvement, especially when multiple OpenMP threads communicate simultaneously via MPI.

MSC:

68M20 Performance evaluation, queueing, and scheduling in the context of computer systems
68M99 Computer system organization
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] ASC Sequoia Benchmarks. https://asc.llnl.gov/sequoia/benchmarks
[2] Bull, J.M., O’Neill, D.: A microbenchmark suite for OpenMP 2.0. In: Proceedings of the Third European Workshop on OpenMP (EWOMP’01). Barcelona, Spain (2001)
[3] Bull, J.M., Enright, J., Ameer, N.: A microbenchmark suite for mixed-mode OpenMP/MPI. In: Proceedings of Fifth International Workshop on Openmp (IWOMP ’09), Dresden, Lecture Notes in Computer Science, vol. 5586. pp. 118–131. Springer (2009)
[4] Edwards, R.G., Joo, B.: The chroma software system for lattice QCD. In: Proceedings of the 22nd International Symposium for Lattice Field Theory (Lattice2004), Nucl. Phys B1 40 (Proc. Suppl) p. 832 (2005)
[5] Hutter J., Curioni A.: Dual-level parallelism for Ab initio molecular dynamics: reaching teraflop performance with the CPMD code. Parallel Comput. 31(1), 1–17 (2005) · doi:10.1016/j.parco.2004.12.004
[6] Intel MPI Benchmarks. http://www.intel.com/cd/software/products/asmo-na/eng/cluster/mpi/219847.htm
[7] Jin H., van der Wijngaart R.F.: Performance characteristics of the multi-zone NAS parallel benchmarks. J. Parallel Distrib. Comput. 66(5), 674–685 (2006) · Zbl 1101.68415 · doi:10.1016/j.jpdc.2005.06.016
[8] McClendon, C.: Optimized Lattice QCD Kernels for a Pentium 4 Cluster, Jlab preprint, JLAB-THY-01-29. http://www.jlab.org/\(\sim\)edwards/qcdapi/reports/dslash_p4.pdf
[9] The MIMD Lattice Computation (MILC) Collaboration. http://www.physics.utah.edu/\(\sim\)etar/milc/index.html
[10] MPI Forum, MPI: A Message-Passing Interface Standard Version 2.2 (2009)
[11] OpenMP ARB, OpenMP Application Programming Interface Version 3.0 (2008)
[12] Rabenseifner, R: Hybrid parallel programming on HPC platforms. In: Proceedings of the Fifth European Workshop on OpenMP, EWOMP ’03, pp. 185–194, Aachen, Germany (2003)
[13] Rabenseifner, R., Hager, G., Jost, G.: Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In: Proceedings of the 17th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2009) (2009)
[14] Reussner R., Sanders P., Traeff J.L.: SKaMPi: a comprehensive benchmark for public benchmarking of MPI. Sci. Program. 10(1), 55–65 (2002)
[15] Salmond, D., Saarinen, S.: Early experiences with the new IBM p690+ at ECMWF. In: Proceedings of the Eleventh ECMWF Workshop, pp. 1–12. World Scientific, Reading, UK (2005)
[16] Smith L., Bull M.: Development of mixed mode MPI/OpenMP applications. Sci. Program. 9(2–3), 83–98 (2001)
[17] The Sphinx Parallel Microbenchmark Suite. http://www.llnl.gov/CASC/sphinx/sphinx.html
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.