A First Glance at Kilo-instruction Based Multiprocessors

Computing Frontiers


The ever increasing gap between processor and memory speed, sometimes referred to as the Memory Wall prob- lem [42], has a very negative impact on performance. This mismatch will be more severe in future processor’s gener- ation. Modern cache organizations and prefetching tech- niques will not be able to solve this problem. A very novel and promising technique to deal with the Memory Wall con- sists on designing processors able to maintain thousands of in-flight instructions. An example of this kind of processors has been denoted as Kilo-instruction processors [8]. When running numerical applications, Kilo-instruction processors have demonstrated its ability to effectively maintain high values of IPC while increasing memory latencies. In this paper, we will study for the first time, the influ- ence of Kilo-instruction processors on the performance of small-scale CC-NUMA multiprocessors. Our first results, using an ideal network, show the enormous potential of the Kilo-instruction processors, when using them as comput- ing nodes, not only for hiding local DRAM latencies but also for the remote ones. A deeper analysis, using real- istic networks, reveals the existence of heavy demands on packet throughput required by each node, since larger re- order buffers translate on higher density of remote accesses. Next, we show that current interconnection networks can- not cope with this high traffic levels, so newer and faster networks have to be designed. In short, our results show dramatic performance gains over multiprocessors based on current microprocessors and dictate a possible way to build future shared-memory multiprocessor systems.