H. Karner, M. Auer, C. Überhuber:
"Multiply-Add Optimized FFT Kernels";
Mathematical Models & Methods in Applied Sciences, 11 (2001), S. 105 - 117.

Kurzfassung englisch:
Modern computer architecture provides a special instruction---the fused multiply­add (FMA) instruction---to perform both a multiplication and an addition operation at the same time. In this paper newly developed radix­2, radix­3, and radix­5 FFT kernels that efficiently take advantage of this powerful instruction are presented. If a processor is provided with FMA instructions, the radix­2 FFT algorithm introduced has the lowest complexity of all Cooley­Tukey radix­2 algorithms. All floating­point operations are executed as FMA instructions. Compared to conventional radix­3 and radix­5 kernels, the new radix­3 and radix­5 kernels greatly improve the utilization of FMA instructions, which results in a significant reduction in complexity. In general, the advantages of the FFT algorithms presented in this paper are their low arithmetic complexity, their high efficiency, and their striking simplicity. Numerical experiments show that FFT programs using the new kernels clearly outperform even the best conventional FFT routines.

Erstellt aus der Publikationsdatenbank der Technischen Universität Wien.