H. Karner, M. Auer, C. Überhuber:
"Multiply-Add Optimized FFT Kernels";
Mathematical Models & Methods in Applied Sciences,
Modern computer architecture provides a special instruction---the fused multiplyadd (FMA) instruction---to perform both a multiplication and an addition operation at the same time. In this paper newly developed radix2, radix3, and radix5 FFT kernels that efficiently take advantage of this powerful instruction are presented. If a processor is provided with FMA instructions, the radix2 FFT algorithm introduced has the lowest complexity of all CooleyTukey radix2 algorithms. All floatingpoint operations are executed as FMA instructions. Compared to conventional radix3 and radix5 kernels, the new radix3 and radix5 kernels greatly improve the utilization of FMA instructions, which results in a significant reduction in complexity. In general, the advantages of the FFT algorithms presented in this paper are their low arithmetic complexity, their high efficiency, and their striking simplicity. Numerical experiments show that FFT programs using the new kernels clearly outperform even the best conventional FFT routines.
Erstellt aus der Publikationsdatenbank der Technischen Universität Wien.