Improving Last Level Shared Cache Performance through Mobile Insertion Policies (MIP)

Publication
Parallel Computing Journal, Elsevier.
Date

Abstract

For those Cache Hierarchy levels where program locality is not as evident as in L1, LRU replacement does not seem to be the optimal solution to determine which blocks will be requested soon. The literature is prolific on alternative reuse-distance estimations at last on-chip cache level, proving the difficulty of achieving an optimal hit rate. One of the key aspects for performance is knowing inter and intra application reuse-distance variability. Many solutions already do this, but most of them rely on a simple choice among a few alternative policies. The experiments performed to motivate the proposal confirm application variability, but also show that the behavior of applications is much more than bimodal. This means that there is a performance gap that current hybrid policies are not able to cover. In this paper we propose a Mobile Insertion Position replacement policy (MIP), which combines well known LRU ordering and promotion policies with a completely adaptive insertion mechanism. The dynamic behavior of insertion is able to capture hit-rate variability in a more accurate way. Making use of set dueling and dynamic set sampling for prediction, our mechanism continuously estimates the insertion position that maximizes the cache hit rate. The hardware overhead compared to a LRU replacement algorithm is merely three 3-bit saturating counters per LLC bank. Our experiments show that for a wide range of applications, MIP is able to improve the hit rate of LRU by 30% on average. MIP outperforms current state-of-the-art replacement policies with a similar implementation cost by 10% on average and in single-thread or multi-thread workloads by 20%.