×

Dense Gaussian networks: Suitable topologies for on-chip multiprocessors. (English) Zbl 1103.68423

Summary: This paper explores the suitability of dense circulant graphs of degree four for the design of on-chip interconnection networks. Networks based on these graphs reduce the Torus diameter in a factor \(\sqrt{2}\), which translates into significant performance gains for unicast traffic. In addition, they are clearly superior to Tori when managing collective communications. This paper introduces a new two-dimensional node’s labeling of the networks explored which simplifies their analysis and exploitation. In particular, it provides simple and optimal solutions to two important architectural issues: routing and broadcasting. Other implementation issues such as network folding and scalability by using hierarchical networks are also explored in this work.

MSC:

68M99 Computer system organization
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] L. A. Barroso et al., Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing, in 27th Annual International Symposium on Computer Architecture, pp. 282–293 (2000).
[2] Tendler et al. J.M. (2002). Power4 System Microarchitecture. IBM J Res Devel 46(1): 5–25 · Zbl 05420524 · doi:10.1147/rd.461.0005
[3] R. Mullins, A. West, and S. Moore, Low-Latency Virtual-Channel Routers for On-Chip Networks, in 31th International Sysmposium on Computer Architecture, pp. 188–197 (2004).
[4] Boesch F.T., Wang J. (1985). Reliable Circulant Networks with Minimum Transmission Delay. IEEE Trans Circ. Syste. 32: 1286–1291 · Zbl 0583.94018 · doi:10.1109/TCS.1985.1085667
[5] C. Martínez, R. Beivide, J. Gutierrez, and E. Gabidulin, On the Perfect t-Dominating Set Problem in Circulant Graphs and Codes over Gaussian Integers, in Proceedings of the 2005 IEEE International Symposium on Information Theory (ISIT’05), Adelaide, Australia, (2005).
[6] Z. Cvetanovic, Performance Analysis of the Alpha 21364-based HP GS1280 Multiprocessor, in 30th Annual International Symposium on Computer Architecture, pp. 218–228 (2003).
[7] V. Puente, C. Izu, J. A. Gregorio, R. Beivide, J. M. Prellezo, and F. Vallejo, Improving Parallel System Performance by Changing the Arrangement of the Networks Links, Conference Proceedings of the 2000 International Conference on Supercomputing. Santa Fe, New Mexico. pp. 44–53 (2000).
[8] Yang Y., Funashi A., Jouraku A., Nishi H., Amano H., Sueyoshi T. (2001). Recursive Diagonal Torus: An Interconnection Network for Massively Parallel Computers. IEEE Trans. Parallel Distrib. Syst. 12(7): 701–715 · Zbl 05107351 · doi:10.1109/71.940745
[9] V. Puente, J. A. Gregorio, F. Vallejo, and R. Beivide, Immunet: a Cheap and Robust Fault-Tolerant Packet Routing Mechanism, in 31th Annual International Symposium on Computer Architecture, pp. 198–209 (2004).
[10] M. Blumrich et al., Design and Analysis of the BlueGene/L Torus Interconnection Network, IBM Research Report, RC23025 (W0312-022), December 3, Computer Science, (2003).
[11] M. M. K. Martin, M. D. Hill, and D. A. Wood, Token Coherence: Decoupling Performance and Correctness, in 30th Annual International Symposium on Computer Architecture, pp. 182–193 (2003).
[12] M. R. Mullins, J. D. Bingham, M. D. Hill, A. J. Hu, M. M. Martin and D. A. Wood, Improving Multiple-CMP Systems Using Token Coherence, in 11th International Sysmposium on High Performance Computer Architecture, pp. 328–339 (2005).
[13] Y. Yang and J. Wang, Efficient All-to-All Broadcast in All-Port Mesh and Torus Networks, in 5th International Symposium on High Performance Computer Architecture, Florida (1999).
[14] M. A. Fiol, J. L. Yebra, I. Alegre, and M. Valero, A Discrete Optimization Problem in Local Networks and Data Alignment, IEEE Trans Comput, 36(6):702–713 (1987).
[15] Wong C.K., Coppersmith D. (1974). A Combinatorial Problem Related to Multimodule Memory Organizations. J. ACM 21(3): 392–402 · Zbl 0353.68039 · doi:10.1145/321832.321838
[16] R. Beivide, E. Herrada, J. L. Balcázar, and J. Labarta, Optimized Mesh-Connected Networks for SIMD and MIMD Architectures, in 14th Annual International Symposium on Computer Architecture, pp. 163–169 (1987).
[17] J.-C. Bermond, G. Illiades, and C. Peyrat, An Optimization Problem in Distributed Loop Computer Networks, in 3rd Interantional Conference on Combinatorials Mathematics, New York Academy of Sciences, pp. 1–13 (1985). · Zbl 0727.68088
[18] Beivide R., Herrada E., Balcázar J.L., Arruabarrena A. (1991). Optimal Distance Networks of Low Degree for Parallel Computers. IEEE Trans. Comput. C-40(10): 1109–1124 · doi:10.1109/12.93744
[19] C. Martínez, E. Vallejo, M. Moretó, R. Beivide and M. Valero, Hierarchical Topologies for Large-scale Two-level Networks, in XVI Jornadas de Paralelismo. Granada, Spain (2005).
[20] Mellor-Crummey J., Scott M. (1991). Algorithms for Scalable Synchronization on Shared- Memory Multiprocessors. ACM Trans. Comput. Syst. 9(1):2165 · doi:10.1145/103727.103729
[21] E. Vallejo, R. Beivide, and C. Martínez, Practicable Layouts for Optimal Circulant Graphs, in Euromicro Conference on Parallel, Distributed and Network-based Processing, Switzerland, (2005).
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.