Publication Entry

[Back]

Talks and Poster Presentations (without Proceedings-Entry):

N. Happenhofer, M. Page:
"CPU vs. GPU - a fair comparison";
Talk: Parallele Architekturen und Programmiermodelle, Universität Wien; 2012-01-31.

English abstract:

Recent advances in computing have led to an explosion in the amount of data being generated. Processing the ever-growing data in a timely manner has made throughput computing an important aspect for emerging applications. Luckily, many modern applications allow for their kernels to be parallelized which makes them suitable for todays multi-core CPUs and GPUs. In the past few years, there have been many studies claiming that GPUs deliver substantial speedups over multi-core CPUs for important application kernels. It is, however, non-obvious how these two architectures should be compared with each other in the first place, which leads to inconsistent and even surreal speedup numbers between 10X and 1000X. When calculating speedup numbers on a multicore CPU, it is quite easy to run the same code on different amounts of cores and then compare the differences in runtime. Considering GPUs, this is not as easy anymore. First of all, the same comparison can lead to very different results on different hardware and it is thus unfair to compare a brand-new GPU card to a CPU that might be half a year old. Also, comparisons of GPU performance with mobile versions of certain CPUs will be misleading, since this hardware is designed for a very different set of tasks. Sometimes, one can even see GPU performance compared to a sequential single-core code, which is not up to date and thus no relevant comparison.

Even without focusing on those hardware related issues, it is not clear what the correct quantities to compare are within this setting. One thing that is forgotten quite often is that for a GPU program to complete, there is more time needed than the amount that it actually takes the GPU to run the corresponding kernel. Instead for each kernel call, a certain CPU time is needed to issue the kernel, as well as the time it takes to copy the application data onto the graphics device, which should be taken into account. Another problem is that on GPUs and CPUs one, in fact, compares two different codes, where the GPU code is usually extremely optimized and the CPU code is not.

Additionally there are quite some other topics of utter relevance than the mere runtime of each individual application. In recent years, power consumption became an increasingly important factor when it comes down to cost analysis of code generation and execution. To that end, for industrial software engineering, it might be more important to have a code and hardware with low power consumption, than having quick execution time at all costs. These thoughts led to the inauguration of the Green500 list, which compares the Top500 supercomputers by their performance in FLOPs/Watt, taking thus the energy efficiency into account. Since the architectures of CPUs and GPUs differ considerably, differences in the energy consumption have to be expected.

Since GPU computing requires changes to the running CPU code and, in order the deliver the aimed results, many optimizations, it is not that easy to generate good GPU code in the first place. To be able to even work within this area, additional knowledge is required which coincides with additional costs for training employees and additional time for the code to be written before execution.

All in all CPUs and GPUs provide two very different hardware models which were created for totally different purposes and are thus not easily comparable. In this work we aim to shed light on different aspects of the comparison between GPU and CPU computing and try to define what a fair comparison should look like.

Created from the Publication Database of the Vienna University of Technology.