×

Locality-aware task scheduling for homogeneous parallel computing systems. (English) Zbl 1398.68062

Summary: In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce execution time and energy consumption of parallel applications. Locality can be exploited at various hardware and software layers. For instance, by implementing private and shared caches in a multi-level fashion, recent hardware designs are already optimised for locality. However, this would all be useless if the software scheduling does not cast the execution in a manner that promotes locality available in the programs themselves. Since programs for parallel systems consist of tasks executed simultaneously, task scheduling becomes crucial for the performance in multi-level cache architectures. This paper presents a heuristic algorithm for homogeneous multi-core systems called locality-aware task scheduling (LeTS). The LeTS heuristic is a work-conserving algorithm that takes into account both locality and load balancing in order to reduce the execution time of target applications. The working principle of LeTS is based on two distinctive phases, namely; working task group formation phase (WTG-FP) and working task group ordering phase (WTG-OP). The WTG-FP forms groups of tasks in order to capture data reuse across tasks while the WTG-OP determines an optimal order of execution for task groups that minimizes the reuse distance of shared data between tasks. We have performed experiments using randomly generated task graphs by varying three major performance parameters, namely: (1) communication to computation ratio (CCR) between 0.1 and 1.0, (2) application size, i.e., task graphs comprising of 50-, 100-, and 300-tasks per graph, and (3) number of cores with 2-, 4-, 8-, and 16-cores execution scenarios. We have also performed experiments using selected real-world applications. The LeTS heuristic reduces overall execution time of applications by exploiting inter-task data locality. Results show that LeTS outperforms state-of-the-art algorithms in amortizing inter-task communication cost.

MSC:

68M20 Performance evaluation, queueing, and scheduling in the context of computer systems
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Wolf, W; Jerraya, AA; Martin, G, Multiprocessor system-on-chip (mpsoc) technology, IEEE Trans CAD ICs Syst, 27, 1701-1713, (2008)
[2] Bhatti, MK; Oz, I; Popov, K; Brorsson, M; Farooq, U, Scheduling of parallel tasks with proportionate priorities, Arab J Sci Eng, 41, 3279-3295, (2016)
[3] Yoo RM, Hughes CJ, Kim C, Chen Y-K, Kozyrakis C (2013) Locality-aware task management for unstructured parallelism: a quantitative limit study. In: Proceedings of the twenty-fifth annual ACM symposium on parallelism in algorithms and architectures, ser. SPAA ’13. ACM, New York, NY, pp 315-325. https://doi.org/10.1145/2486159.2486175
[4] Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Pearson A. Wesley, Reading · Zbl 0861.68040
[5] Sinnen, O; Sousa, L, List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures, Parallel Comput, 30, 81-101, (2004)
[6] Sinnen, O, Reducing the solution space of optimal task scheduling, Comput OR, 43, 201-214, (2014) · Zbl 1348.90312
[7] Bhatti, MK; Belleudy, C; Auguin, M, Hybrid power management in real time embedded systems: an interplay of DVFs and DPM techniques, Real-Time Syst, 47, 143-162, (2011)
[8] Shahul, AS; Sinnen, O, Scheduling task graphs optimally with a*, J Supercomput, 51, 310-332, (2010)
[9] Sinnen, O; Sousa, LA, Communication contention in task scheduling, IEEE Trans Parallel Distrib Syst, 16, 503-515, (2005)
[10] Dally W (2009) The future of GPU computing. In: The 22nd annual supercomputing conference
[11] Hill M, Kozyrakis C (2012) Advancing computer systems without technology progress. In: DARPA/ISAT workshop
[12] Consortium CC (2012) 21st century computer architecture. A community white paper
[13] Set STG http://www.kasahara.elec.waseda.ac.jp/schedule
[14] Sinnen O (2007) Task scheduling for parallel systems. Wiley, New York. ISBN 978-0-471-73576-2
[15] Yang, T; Gerasoulis, A, Dsc: scheduling parallel tasks on an unbounded number of processors, IEEE Trans Parallel Distrib Syst, 5, 951-967, (1994)
[16] Kasahara, H; Narita, S, Practical multiprocessor scheduling algorithms for efficient parallel processing, IEEE Trans Comput, C-33, 1023-1029, (1984)
[17] Khan, MA, Scheduling for heterogeneous systems using constrained critical paths, Parallel Comput, 38, 175-193, (2012)
[18] Topcuouglu, H; Hariri, S; you Wu, M, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Trans Parallel Distrib Syst, 13, 260-274, (2002)
[19] Kwok, Y-K; Ahmad, I, Link contention-constrained scheduling and mapping of tasks and messages to a network of heterogeneous processors, Cluster Comput, 3, 113-124, (2000)
[20] Ahmad, I; Kwok, Y-K, On exploiting task duplication in parallel program scheduling, IEEE Trans Parallel Distrib Syst, 9, 872-892, (1998)
[21] Kwok, Y-K; Ahmad, I, Dynamic critical-path scheduling: an effective technique for allocating task graphs to multiprocessors, IEEE Trans Parallel Distrib Syst, 7, 506-521, (1996)
[22] Wu, M-Y; Gajski, D, Hypertool: a programming aid for message-passing systems, IEEE Trans Parallel Distrib Syst, 1, 330-343, (1990)
[23] Fard HM, Prodan R, Barrionuevo JJD, Fahringer T (2012) A multi-objective approach for workflow scheduling in heterogeneous environments. In: 2012 12th IEEE/ACM international symposium on cluster, cloud and grid computing (ccgrid 2012), pp 300-309
[24] Arabnejad, H; Barbosa, J, List scheduling algorithm for heterogeneous systems by an optimistic cost table, IEEE Trans Parallel Distrib Syst, 25, 682-694, (2014)
[25] Iverson MA, Ozguner F, Follen GJ (1995) Parallelizing existing applications in a distributed heterogeneous environment. In: HCW ’95, pp 93-100
[26] Bertrand Cirou EJ (2001) Triplet: a clustering scheduling algorithm for heterogeneous systems. New York. https://doi.org/10.1109/ICPPW.2001.951956
[27] Kim, S; Browne, J, General approach to mapping of parallel computations upon multiprocessor architectures, Unknown J, 3, 1-8, (1988)
[28] Sarkar V (1989) Partitioning and scheduling parallel programs for multiprocessors. MIT Press, Cambridge, MA · Zbl 0679.68056
[29] Kanemitsu, H; Hanada, M; Nakazato, H, Clustering-based task scheduling in a large number of heterogeneous processors, IEEE Trans Parallel Distrib Syst, 27, 3144-3157, (2016)
[30] Shahul, AZ; Sinnen, O, Scheduling task graphs optimally with a*, J Supercomput, 51, 310-332, (2010)
[31] Deelman, E; Singh, G; Su, M-H; Blythe, J; Gil, Y; Kesselman, C; Mehta, G; Vahi, K; Berriman, GB; Good, J; Laity, A; Jacob, JC; Katz, DS, Pegasus: a framework for mapping complex scientific workflows onto distributed systems, Sci Program, 13, 219-237, (2005)
[32] Darte A, Robert Y, Vivien F (2002) Scheduling and automatic parallelization. BirkhŁuser, New York. ISBN 0-8176-4149-1 · Zbl 0956.68009
[33] Suter F, Desprez F, Casanova H (2004) From heterogeneous task scheduling to heterogeneous mixed parallel scheduling. In: Euro-Par 2004 parallel processing, pp 230-237 · Zbl 1096.68576
[34] Orsila, H; Kangas, T; Salminen, E; Hamalainen, TD; Hannikainen, M, Automated memory-aware application distribution for multi-processor system-on-chips, JSA, 53, 795-815, (2007)
[35] Langen, P; Juurlink, B, Leakage-aware multiprocessor scheduling, J Signal Process Syst, 57, 73-88, (2009)
[36] Bhatti MK, Oz I, Popov K, Muddukrishna A, Brorsson M (2014) Noodle: a heuristic algorithm for task scheduling in MPSoC architectures. In: 2014 17th Euromicro conference on digital system design (DSD). IEEE, pp 667-670
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.