LIGERO: A Light but Efficient Router Conceived for Cache Coherent Chip Multi Processors

ACM Transactions on Architecture and Code Optimization


Although abstraction is the best approach to deal with computing system complexity, sometimes implementation details should be considered. Considering on-chip interconnection networks in particular, underestimating the underlying system specificity could have non negligible impact on performance, cost or correctness. This paper presents a very efficient router that has been devised to deal with cache coherent chip multiprocessor particularities in a balanced way. Employing the same principles of packet rotation structures as in the Rotary Router, we present a router configuration with the following novel features: (1) reduced buffering requirements, (2) optimized pipeline under contention-less conditions, (3) more efficient deadlock avoidance mechanism and (4) optimized in-order delivery guarantee. Putting it all together, our proposal provides a set of features that no other router, to the best of our knowledge, has achieved previously. These are: (1’) low implementation cost, (2’) low pass-through latency under low load, (3’) improved resource utilization through adaptive routing and a buffering scheme free of head-of-line blocking, (4’) guarantee of coherence protocol correctness via end-to-end deadlock avoidance and in-order delivery, and (5’) improvement of coherence protocol responsiveness through adaptive in-network multicast support. We conduct a thorough evaluation that includes hardware cost estimation and performance evaluation under a wide spectrum of realistic workloads and coherence protocols. Comparing our proposal with VCTM, an optimized state-of-the-art wormhole router, it requires 50% less area, reduces on-chip cache hierarchy energy delay product on average by 20% and improves the cache coherency chip multiprocessor performance under realistic working conditions by up to 20%.