Accelerating Quicksilver – a Monte Carlo Proxy App on Multicores
TimeTuesday, June 23rd2:55pm - 3pm
DescriptionQuicksilver is a proxy app that represents some elements of the Mercury workload used in the Lawrence Livermore National Laboratory. It solves a simplified dynamic Monte Carlo particle transportation problem using multigroup cross-sections. Quicksilver attempts to replicate the memory access patterns, communication patterns, and the branching or divergence of Mercury. By the virtue of captures key features of Mercury, lessons learned in optimizing Quicksilver have direct implications on Mercury and as a result, quicksilver is being regularly used in novel architecture co-design eﬀorts, influences hardware procurements at DOE.
Getting high-performance in Quicksilver is quite challenging since its performance is dominated by latency bound table loop-ups and branch divergence. The code is C++ STL heavy: implemented using complex classes containing vectors and arrays; requiring multiple levels of indirections for data lookups and frequent non-unit stride and random memory accesses. Additionally, the control flow is dominated by branch divergence causing low vectorization opportunities. Therefore, it is not surprising that the state-of-the-art GPUs have been reported to do only marginally well compared to general-purpose CPUs. In this research, we propose algorithmic and engineering optimizations to Quicksilver that provide 1.8X speedup on state-of-the-art on Multicore CPUs compared to its original version. The major speedup comes from asymptotic cost improvement, automatic vectorizations, better data-reuse, and improved atomics. We are working on optimizing Quicksilver on various special-purpose accelerators.