# The Programmer’s Guide To FFT – Part 2: FFT

This will be a 2 part series on fast fourier transform (FFT). My aim for these posts is to provide a more hands-on and layman friendly approach to this algorithm, contrast to a lot of the theoretically heavy material available on the internet. In short: less math, no proofs, examples provided, and working source code (C++11).

In my previous post, I wrote about DFT as the basis to understand FFT. In this final part I will write about the FFT algorithm as outlined in Cooley and Tukey’s seminal work published in 1964. I assume you understood the material in the prior post before coming here, since everything here builds on top of it.

## Key Ideas

Before we begin, let us assume that the length of the input (N) is always in powers of 2 (2, 4, 8, 16, 32…). I will explain why is this important later on.

There are 2 key ideas that enable the FFT algorithm to work. First is to understand that DFT can be separated as a sum of odd and even parts:

$\begin{array}{lcl}F_k & = & \sum\limits_{n=0}^{N-1}x_n\cdot e^{-\frac{i2\pi k n}{N}} \\ & = & \sum\limits_{m=0}^{N/2-1}x_{2m}\cdot e^{-\frac{i2\pi k (2m)}{N}} + \sum\limits_{m=0}^{N/2-1}x_{2m+1}\cdot e^{-\frac{i2\pi k (2m+1)}{N}} \\ & = & \sum\limits_{m=0}^{N/2-1}x_{2m}\cdot e^{-\frac{i2\pi k (m)}{N/2}} + \sum\limits_{m=0}^{N/2-1}x_{2m+1}\cdot e^{-\frac{i2\pi k (m+1/2)}{N/2}} \\ & = & \sum\limits_{m=0}^{N/2-1}x_{2m}\cdot e^{-\frac{i2\pi k (m)}{N/2}} + \sum\limits_{m=0}^{N/2-1}x_{2m+1}\cdot e^{-\frac{i2\pi k (m)}{N/2} - \frac{i\pi k}{N/2}} \\ & = & \sum\limits_{m=0}^{N/2-1}x_{2m}\cdot e^{-\frac{i2\pi k (m)}{N/2}} + e^{-\frac{i2\pi k}{N}} \sum\limits_{m=0}^{N/2-1}x_{2m+1}\cdot e^{-\frac{i2\pi k (m)}{N/2}}\end{array}$

Let us define a function $\omega$ (read as omega):

$\omega(p, q) = e^{\frac{i2\pi q}{p}}$

Now we simplify the DFT formulation to:

$F_k = \sum\limits_{m=0}^{N/2-1}x_{2m}\cdot \omega(km, \frac{N}{2}) + \omega(N, k) \sum\limits_{m=0}^{N/2-1}x_{2m+1}\cdot \omega(km, \frac{N}{2})$

Let’s generalize further to:

$F_k = F_k^{\text{even}} + \omega(N, k) \cdot F_k^{\text{odd}}$

The second key idea is to take advantage of the periodic nature of DFT:

$\begin{array}{lcl}F_k & = & F_k^{\text{even}} + \omega(N, k) \cdot F_k^{\text{odd}} \\ F_{k+\frac{N}{2}} & = & F_k^{\text{even}} - \omega(N, k) \cdot F_k^{\text{odd}}\end{array}$

What this means is that in the process of calculating the resulting sequence $F$ you only need to compute $\omega(N, k) \cdot F_k^{\text{odd}}$ a total of $\frac{N}{2}$ times; we can essentially half the number of computations using this technique. But why stop there? We can also take either $F_k^{\text{even}}$ or $F_k^{\text{odd}}$ and split them to odd and even parts, and repeat the same procedure. If we compute this recursively, the base case for this is when $N = 1$. In this manner we compute $\omega(N, k) \cdot F_k^{\text{odd}}$ for as many times as we can divide it by 2, or $\log N$. Therefore, for sequence of size N, FFT computes the DFT in $N \log N$ time.

## Example

Ok. That was probably hard to grasp, so let us break it down. Take an example where N = 2: a sequence in the coefficient representation s is (1, 9), and we want to convert it to point-value representation. The even and odd sequence is simply 1 and 9 respectively. We can use an auxiliary variable h to store $\omega(N, k) \cdot F_k^{\text{odd}}$:

$h = \omega(2, 0) \cdot 9 = 9$

$F_0 = 1 + h = 1 + 9 = 10$

$F_1 = 1 - h = 1 - 9 = -8$

Notice how we only need to compute $\omega(N, k) \cdot F_k^{\text{odd}}$ once and reused it for $F_{0 + \frac{N}{2}}$.

Now we go to a more complex example, where N = 8: s = (1, 6, 3, 8, 9, 5, 4, 2). Here we can show that it is possible that by using the fact that DFT can be expressed as sum of even and odd parts, that we can recursively divide s to smaller subproblems, up until N = 1:

I arranged such that the subsequence to the left contains even parts, and the sequence to the right contains odd parts. Now that we separate it nicely we can systemically work on the smaller parts and work our way up until the final answer. I’ve made a nice diagram that illustrates the computational flow of the FFT algorithm:

As before, the sequence to the left are the even parts and the sequence to the right are the odd parts. The cells show the type: yellow for real numbers, and green for complex numbers. Blue arrows branch out from even sequences, and red arrows branch out from odd sequences. Red arrows also denote that the cell it came from will be multiplied by $\omega(N, k)$, though not visually depicted. The whole computation flow shows a sort of “butterfly pattern”, as how most engineers like to describe it.

IFFT works roughly the same way as FFT in that it uses the same technique to save computation, so if you understand FFT you should get IFFT as well.

## Implementation

The implementation here includes FFT and IFFT. As with the DFT and IDFT implementation in the previous post, it takes a sequence in the coefficient representation and spits out a sequence of the same size in the point-value representation using FFT, and takes that sequence puts it through IFFT to get back the original sequence. As with before, my emphasis is on readability not optimization.

#include <iostream>
#include <complex>
#include <cmath>
#include <iomanip>
#include <vector>
#include <algorithm>

using namespace std;

double PI = acos(0) * 2;
typedef complex<double> xd;
typedef vector<double> dvec;
typedef vector<xd> xvec;
const xd J(0, 1); // sqrt(-1)

inline xd omega(const double &p, const double &q)
{
return exp((2. * PI * J * q) / p);
}

xvec _fft(xvec &f)
{
double N = f.size();

if (N == 1) return f;

xvec fe, fo;
fe.reserve(N / 2);
fo.reserve(N / 2);

for (int i = 0; i < N; i += 2) {
fe.push_back(f[i]);     // even
fo.push_back(f[i + 1]); // odd
}

fe = _fft(fe);
fo = _fft(fo);

for (int m = 0; m < N / 2; ++m) {
xd omfo = omega(N, -m) * fo[m];
f[m]         = fe[m] + omfo;
f[m + N / 2] = fe[m] - omfo;
}

return f;
}

xvec fft(const dvec &x)
{
xvec f(x.size());

for (size_t i = 0; i < x.size(); ++i) {
f[i] = xd(x[i], 0);
}

return _fft(f);
}

xvec _ifft(xvec &x)
{
double N = x.size();

if (N == 1) return x;

xvec xe, xo;
xe.reserve(N / 2);
xo.reserve(N / 2);

for (int i = 0; i < N; i += 2) {
xe.push_back(x[i]);     // even
xo.push_back(x[i + 1]); // odd
}

xe = _ifft(xe);
xo = _ifft(xo);

for (int m = 0; m < N / 2; ++m) {
xd iomxo = omega(N, m) * xo[m];
x[m]         = xe[m] + iomxo;
x[m + N / 2] = xe[m] - iomxo;
}

return x;
}

dvec ifft(xvec f)
{
double N = f.size();

xvec xcomplex = _ifft(f);
dvec x(N);

for (int i = 0; i < N; ++i) {
x[i] = xcomplex[i].real() / N;
}

return x;
}

int main()
{
cout << fixed << setprecision(2);

dvec input = { 1,6,3,8,9,5,4,2 };

// convert from time to frequency domain
xvec freqdom = fft(input);

for (const auto &f : freqdom) {
cout << f << endl;
}
cout << endl;

// convert from frequency to time domain
auto timedom = ifft(freqdom);

for (const auto &t : timedom) {
cout << t << ' ';
}
cout << endl;
}


It is a good idea to set breakpoints to see how the recursive implementation of FFT systematically solves DFT from the smaller subproblems.

Because of how similar FFT and IFFT is, it is not hard to merge them into a function and pass a boolean parameter to determine whether it will be FFT and IFFT (most implementations online will call this multi-purpose function “transform”), but for the sake of clarity I refrain from doing so.

Convolution is practically the same as before – it’s just that we replace DFT with FFT and IDFT with IFFT:

// vector convolution
dvec convolve(const dvec &a, const dvec &b)
{
// calculate degree of resulting polynomial
size_t N = 2 * a.size() - 1;

// extend size to match result
dvec acof(N), bcof(N);
copy(a.begin(), a.end(), acof.begin());
copy(b.begin(), b.end(), bcof.begin());

xvec apv, bpv, cpv(N);

// evaluation
apv = fft(acof);
bpv = fft(bcof);

// point-wise multiplcation
for (size_t i = 0; i < N; ++i) {
cpv[i] = apv[i] * bpv[i];
}

for (const auto &t : cpv)  cout << t << ' ';
cout << endl;

// interpolation
return ifft(cpv);
}


Now we estimate the time complexity of vector convolution using FFT: evaluation ($N \log N$), pointwise multiplication (N) and interpolation ($N \log N$) now costs a total of $2 \times N \log N + N \approx N \log N$.

## Must N Be In Powers of 2?

You could play around with different input sizes and compare the answer with DFT. What you will see is that FFT will return some funny values if N is not a power of 2. Why is this so? Well, you can see from the visual depictions of how FFT works: at the simplest subproblem (N = 2), you need to have one even value and one odd value. FFT must be able to divide in such a way that at some point splitting the input, all subsequences are of size 2, and the only way that is possible is if N is a power of 2.

If it doesn’t make sense, you could just play around with the code to convince yourself I guess.

Is this a bad thing? Well, if all you use FFT for is convolution then no. You could first calculate the resulting polynomial degree of the convolution, then pad the input with 0 until N is a power of 2, evaluate it, do pointwise multiplication, interpolate, and resize it to match the resulting degree.

If you use bit shifts to multiply by 2, you can compute N very quickly (see full implementation here):

// degree of resulting polynomial = size of resulting array
size_t deg = a.size() + b.size() - 1;

// transform array size must be in power of 2 for FFT
size_t N = 1;
while (N < deg) N <<= 1;

// set size for arrays in point-wise representation -
// extended space is padded with 0:
xvec a(N), b(N), c(N);

// ** do convolution... **

// Resize to actual size
c.resize(deg);


But, wouldn’t resizing the input array be slow? Well, we can prove with simple example. Say N = 129: In naive DFT, computing the DFT will take $129^2 = 16641$. In FFT we resize N to the closest power of 2, which is 256. $256 \log 256 \approx 617$. That is 2697% less computations! Of course, it shouldn’t be hard to convince yourself that this is still true as N gets larger.

There are variations of FFT where N does not need to be in powers of 2, but it won’t be the focus of this post.

## Optimizations

If you print the N and m used by the $\omega$ (omega) function, you will notice that there are some repetitions. Here is N and m where N = 8:

2 0
2 0
4 0
4 1
2 0
2 0
4 0
4 1
8 0
8 1
8 2
8 3

So this means it is possible to precompute results from $\omega$ and reuse them. In the case where N = 8, we only need to calculate $\omega$ 7 times as oppose to 12. I have an implementaion of FFT that does this simple optimization.

What else can you do? In-place calculation – here’s my implementation of in-place fft. Aside knowing how to meddle with indices, one key idea is to understand is this: the indices after dividing the sequence is the bit reverse, where the length of bits is $\log N$. For example if N = 16 ($\log N = 4$), the index 7 (0111) with be swapped with 14 (1110).

## Conclusion

We have went through DFT, and now FFT. I hope this helped you understand FFT a little better. It’s ok if you’re still a little foggy with the details; play around the source code long enough and it will be clearer to you. I intend to work on hands-on applications of FFT in future posts, so keep a look out for more!