×

Parallel computing on heterogeneous networks. (English) Zbl 1049.68020

Wiley Series on Parallel and Distributed Computing. Hoboken, NJ: Wiley (ISBN 0-471-22982-2/hbk; 978-0-471-65416-2/ebook). xiii, 423 p. (2003).
Traditional software for parallel computing typically spreads computations evenly over a set of linked processors. This, however, may not always be the best way of maximizing the performance of a given network or cluster of computers. By taking account of the actual performance of individual processors and the links between them, parallel computing on heterogeneous networks offers significant improvements in parallel computation. The monograph provides a timely resource on this innovative technology.
Part  I, Evolution of Parallel Computing, presents an overview of the evolution of parallel computer architectures in relation to the evolution of parallel programming models and tools. In Ch. 1, Serial Scalar Processor, the programming model and basic program properties are considered. Ch. 2, Vector and Superscalar Processors, analyzes optimizing compilers, array libraries, Basic Linear Algebra Subprogram (BLAS) routines, sparse, parallel languages, Fortran 90, the C[ ] language, memory hierarchy and parallel programming tools. Ch. 3, Shared Memory Multiprocessors (SMP), provides a higher level of parallelism than the vector and superscalar architectures via multiple parallel streams of instructions. Nevertheless, the SMP architecture is not scalable. The speedup provided by this architecture is limited by the bandwidth of the memory bus. Multithreading is the primary programming model for the SMP architecture. Serial languages, such as C and Fortran 77, may be used in concert with optimizing compilers to write efficient programs for SMP computers. Unfortunately, only a limited and simple class of multithreaded algorithms can be implemented in an efficient and portable way by this approach. Thread libraries directly implement the multithreading paradigm and allow the programmers to explicitly write efficient multithreaded programs independent of optimizing compilers. Pthreads are standard for Unix platforms thus supporting efficiently portable parallel programming Unix SMP computers. Thread libraries are powerful tools supporting both parallel and distributed computing.
The general programming model underlying the thread libraries is universal and considered too powerful, complicated, and error-prone for parallel programming. OpenMP is a high-level parallel extension of Fortran, C, and C++, providing a simplified multithreaded programming model based on the master/slave design strategy, and aimed specifically at parallel computing on SMP architectures. OpenMP significantly facilitates writing parallel mutlithreaded applications. In Ch. 4, Distributed Memory Multiprocessors, the distributed memory multiprocessors architecture, also known as the MPP architecture, is introduced. This provides much more parallelism than the SMP architecture. Moreover, unlike all other parallel architectures, the MPP architecture is scalable. This means that the speed increase provided by this architecture is potentially infinite. This is due to the absence of principal bottlenecks, such as might limit the number of efficiently interacting processors. Message passing is the dominant programming model for the MPP architecture. As the MPP architecture is farther away from the serial scalar architecture than the vector, superscalar, and even SMP architectures, it is very difficult to automatically generate an efficient message-passing code for the serial source code written in C or Fortran 77. In fact, optimizing C or Fortran 77 compilers for MPPs would involve solving the problem of automatic synthesis of an efficient message-passing program using the source serial code as a specification of its functional semantics. Ch. 5, Networks of Computers: Architecture and Programming Challenges, analyzes challenges associated with parallel programming for common networks of computers (NoCs) that are, unlike dedicated parallel computer systems, inherently heterogeneous and unreliable. This analysis results in a description of the main features of an ideal parallel program running on a NoC.
Part II, Parallel Programming for Networks of Computers with mpC and HMPI, presents the mpC parallel programming language. Ch. 6, Introduction in mpC, addresses some primary challenges of heterogeneous parallel computing, focusing on uneven distribution of computations in heterogeneous parallel algorithms, and the heterogeneity of physical processors of NoCs. Ch. 7, Advanced Heterogeneous Parallel Programming in mpC, presents some advanced features of the mpC language. In general, the mpC language allows the user not only to program computations and communications of the heterogeneous parallel algorithm but also to specify the performance model of the algorithm. This performance model takes into account all the main features of the algorithm that affect its execution time, including the number of parallel processes executing the algorithm, the absolute volume of computations performed by each of the processes, the absolute volume of data transferred between each pair of processes, and the interactions between the parallel processes during the execution of the algorithm. This information is used at runtime to map the algorithm to physical processors of the network of computers. The mpC programming system performs the mapping to minimize the execution time of the algorithm. Ch. 8, Toward a Message-Passing Library for Heterogeneous Networks of Computers, very briefly considers how the mpC language approach to optimal heterogeneous distribution of computations and communications can be implemented in the form of a message-passing library. In so doing, the author provides a small extension of the standard MPI for heterogeneous NoCs.
Part III, Applications of Heterogeneous Parallel Computing, presents a number of advanced mpC applications solving different problems on heterogeneous clusters. Ch. 9, Scientific Applications, demonstrates that a wide range of scientific problems can be efficiently solved on heterogeneous networks of computers. The design of the parallel block cyclic algorithm of matrix multiplication on heterogeneous NoCs and its portable implementation in the mpC language are considered in detail. Also considered are parallel algorithms solving on heterogeneous NoCs a more demanding linear algebra problem: Cholesky factorization of a symmetric, positive-definite matrix. A relatively simple approach to assessment of a heterogeneous parallel algorithm via comparing its efficiency with the efficiency of its homogeneous prototype is presented. Two approaches to the design of parallel algorithms solving regular problems on heterogeneous NoCs are given. The first approach supposes a one process per processor configuration of the parallel program with the workload unevenly distributed over the processes. The second approach assumes a multiple processes per processor configuration of the parallel program, when the workload is evenly distributed over the processes while the number of processes on each processor is proportional to its speed. The author experimentally compares the approaches and describes their portable mpC implementation. Also presented is an \(N\)-body mpC application, which is an example of an inherently irregular problem. A heterogeneous parallel algorithm solving such a problem is naturally deduced from the problem itself rather than from the parallel environment executing the algorithm. Also considered in detail is the design of the parallel adaptive quadrature routine for numerical approximation to definite integrals on heterogeneous NoCs and its portable mpC implementation. Ch. 10, Business and Software Engineering Applications, demonstrates that heterogeneous parallel computing can be used not only to solve scientific problems but also to improve the performance of business distributed applications. Heterogeneous parallel computing has also application in software engineering practice to optimize the maintenance process. Experiences with integration of the mpC-based technology of heterogeneous parallel computing and the CORBA-based technology of distributed computing are demonstrated.
The book does not pretend to be an encyclopedia of parallel computing. It presents a subjective view on parallel programming technologies, but Part I of the book can nevertheless be a good basis for an introductory university course on parallel programming systems. Practically oriented, the book includes illustrative algorithms in the mpC programming language, a unique high-level software tool designed by the author specifically for programming heterogeneous parallel algorithms. All concepts and algorithms are illustrated with working programs that can be compiled or executed on any cluster. Some of the practical applications of these algorithms include: the \(N\)-body problem, the parallel testing of distributed software, the modeling of oil extraction. Appendices provide both the complete source code and user’s guide for the principal applications used to illustrate the book’s material.

MSC:

68M10 Network design and communication in computer systems
68Q10 Modes of computation (nondeterministic, parallel, interactive, probabilistic, etc.)
68-02 Research exposition (monographs, survey articles) pertaining to computer science
68N19 Other programming paradigms (object-oriented, sequential, concurrent, automatic, etc.)
65Y05 Parallel numerical computation
PDFBibTeX XMLCite
Full Text: DOI