SICOSYS: An Integrated Framework for studying Interconnection Network Performance in Multiprocessor Systems

Euromicro Workshop on Parallel and Distributed Processing


The evaluation of network performance under real application loads is carried out by detailed time-intensive and resource- intensive simulations. Moreover, the use of ILP processors in cc-NUMA architectures introduces non-deterministic memory accesses; the resulting parallel system must be modeled by a detailed execution-driven simulation, further increasing the evaluation cost. This work introduces a simulation methodology, based on network traces, to estimate the impact that a given network has on the execution time of parallel applications. This methodology allows the study of the network design space with a level of accuracy close to that of execution-driven simulations but with much shorter simulation times. The network trace, extracted from an execution-driven simulation, is processed to substitute the temporal dependencies produced by the simulated network with an estimation of the message dependencies caused by both the application and the applied cache-coherent protocol. This methodology has been tested on two direct networks, with 16 and 64 nodes respectively, running the FFT and Radix applications of the SPLASH2 suite. The trace-driven simulation is 3 to 4 times faster than the execution-driven one with an average error of 4% in total execution time.