kissfft

There are many great fft libraries already around. Kiss FFT is not trying
to be better than any of them. It only attempts to be a reasonably efficient,
moderately useful FFT that can use fixed or floating data types and can be
incorporated into someone's C program in a few minutes with trivial licensing.

USAGE:

The basic usage is:

void * cfg = kiss_fft_alloc( nfft ,inverse_fft );
while ...
... // put kth sample in cx_buf_in_out[k].r and cx_buf_in_out[k].i
kiss_fft( cfg , cx_buf_in_out );
... // transformed
free(cfg);

Note: frequency-domain data is stored from dc to 2pi.
so cx_buf_in_out[0] is the dc bin of the FFT
and cx_buf_in_out[nfft/2] is the Nyquist bin

Declarations are in "kiss_fft.h", along with a brief description of the
functions you'll need to use. Code definitions are in kiss_fft.c, along
with sample usage code.

The code can be compiled to use float, double or 16bit short samples.
The default is float.

BACKGROUND:

I started coding this because I couldn't find a fixed point FFT that didn't
use assembly code. I started with floating point numbers so I could get the
theory straight before working on fixed point issues. In the end, I had a
little bit of code that could be recompiled easily to do ffts with short, float
or double (other types should be easy too).

Once I got my FFT working, I wanted to get some performance numbers against
a well respected and highly optimized fft library. I don't want to criticize
this great library, so let's call it FFT_BRANDX.
During this process, I learned:

1. FFT_BRANDX has 500 times as many lines of code as Kiss
(and that's just the C code).
2. It took me an embarrassingly long time to get FFT_BRANDX working.
3. A simple program using FFT_BRANDX is 500K. A similar program using kiss_fft is 18k.
4. FFT_BRANDX is about 3-4 times faster than Kiss

It is wonderful that free, highly optimized libraries like FFT_BRANDX exist.
But such libraries carry a huge burden of complexity necessary to extract every
last bit of performance.

Sometimes simpler is better, even if it's not better.

PERFORMANCE:
(on Athlon XP 2100+, with gcc 2.96, optimization O3, float data type)

Kiss performed 1000 1024-pt ffts in 110 ms of cpu time (132ms real time).
For comparison, it took md5sum 160ms cputime to process the same amount of data

DO NOT:
... use Kiss if you need the Fastest Fft in The World
... ask me to add features that will bloat the code

UNDER THE HOOD:

Kiss FFT uses a complex-only, time decimation, mixed-radix, out-of-place FFT.
No scaling is done. Optimized butterflies are used for factors 2 and 4.
Experiments with a radix 3 optimization showed no real gain over the generic
butterfly currently used for non power-2 factors.

LICENSE:
BSD, see COPYING for details. Basically, "free to use, give credit where due, no guarantees"

TODO:
*) Add sample code for parallel ffts (stereo) packed into re,im components of time sequence.
*) Add simple windowing function, e.g. Hamming : w(i)=.54-.46*cos(2pi*i/(n-1))
*) Make the fixed point scaling and bit shifts more easily configurable.
*) Document/revisit the input/output fft scaling
*) See if the fixed point code can be optimized a little without adding complexity.

AUTHOR:
Mark Borgerding
Mark@Borgerding.net

Languages

C 55.2%

C++ 16.8%

CMake 11.6%

Makefile 10.2%

Shell 4%

Other 2.2%