The Programmer’s Guide To FFT – Part 1: DFT

This will be a 2 part series on fast fourier transform (FFT). My aim for these posts is to provide a more hands-on and layman friendly approach to this algorithm, contrast to a lot of the theoretically heavy material available on the internet. In short: less math, no proofs, examples provided, and working source code (C++11).

In this first part I will write about discrete fourier transform (DFT) as the basis to understand FFT.

Sorting as a Metaphor

DFT and FFT are similar as insertion sort is to merge sort; they both take the same type of inputs and spits out the same output, it’s just that FFT runs much faster than DFT by utilizing a technique called divide and conquer (mergesort also uses this). The computational difference is also the same as merge sort is to insertion sort: O(N \log N) vs O(N^2). How much difference is this? Well, if N is 10 billion and each computation takes 1 nanosecond, it is the difference between finishing in 30 seconds as opposed to 30 years.

What Does DFT Do?

DFT takes a discrete series of input in the time domain and converts it to the frequency domain. Computer scientists describe it as converting from the coefficient representation to the point-value representation (more on that later). At either case, the semantics of the input is retained; it is like translating from one language to another.

Also, the conversation from one domain to another needs to be reversible. To convert from the frequency domain back to the time domain, you use the inverse of the DFT, also called IDFT. In computer science, the process to convert from coefficient representation to the point-value representation is called evaluation, and the reverse process is interpolation.

Applications and Motivations

The applications of DFT is diverse: how Shazam matches songs, mp3 compression, remove noise from audio, jpeg compression, edge detection, radar, sonar, optics, seismic activity, satellite transmissions, genome matching (I will put up a post on genome matching in the future as an application of DFT)… Put it this way: as long as it involves a sequence (could be in multiple dimensions) that changes – whether it be over time or space, it can benefit from DFT.

However, this post focuses on using DFT for vector convolution (I profess that as of this writing this is the only thing I know how to use DFT for).

Vector Convolution

We can understand what convolution does as a polynomial multiplication:

Example taken from CLRS

Put formally, we are given a sequence of real numbers a and b (in this example is (6, 7, -10, 9) and -2, 0, 4, -5) respectively) of size n – each sequence maps to the coefficients of the input (thus the term coefficient representation), and we return a sequence of real numbers c (-12, -14, 44, -20, -75, 86, -45) of size 2n – 1. The degree of each polynomial can be mapped from the indices (start from 0). We also describe convolution in mathemical notation as a \otimes b = c.

Each element c_j (0 \leq j < 2n-1) can be calculated like so (formula also from CLRS):

c_j = \sum\limits_{k=0}^{j}a_kb_{j-k}

I have an implementation of this formulation below (click here to see a sample using this function):

typedef vector<int> ivec;

ivec polynomialMultiply(const ivec &a, const ivec &b) 
    int N = 2 * a.size() - 1;
    ivec c(N, 0);
    for (int j = 0; j < N; ++j) { 
        int p = j >= a.size() ? j - a.size() + 1 : 0;
        for (int k = p; k <= j - p; ++k) {
            c[j] += a[k] * b[j - k];
    return c;

Perhaps you might be wondering why is there an auxiliary variable p which is not in the formulation: I don’t know; it doesn’t work without it. BUT, that’s not the point. The point is that this technique takes O(N^2) to run (we can infer this from the nested for loop). The same result, however, can be computed much faster in point-value representation:

In this illustration, yellow indicates the coefficient representation (real numbers) and green indicates the point-value representation (complex numbers). In the point-value form, convolution is simply a pointwise multiplication which can be done in O(N). However, the actual complexity of this method also needs to also factor the computation needed for evaluation and interpolation.

The Math Behind DFT

We now describe the transformation process with more formal mathematical notation. Let x be the sequence of N real values in the time domain, where the n-th element of x is denoted as x_n. DFT then takes x and transforms it to a sequence F of N complex values in the frequency domain, where the k-th element of F is denoted as F_k. We can also write DFT as a function dft(x) = F, and IDFT as idft(F) = x.

The DFT is then given by:

F_k=\sum\limits_{n=0}^{N-1}x_n\cdot e^{-\frac{i2\pi k n}{N}}

And IDFT given by:

x_n=\frac{1}{N}\sum\limits_{k=0}^{N-1}F_k\cdot e^{\frac{i2\pi k n}{N}}

Where i is the imaginary number \sqrt{-1}.

Since each element takes N computations for all N elements, it is not hard to see that both DFT and IDFT takes O(N^2).

Calculate An Example by Hand

It always helps to compute the DFT by hand first to get a feel for it. To do this we will need Euler’s formula to break down the exponential to its sine cosine equivalents:

e^{iy}=\cos y + i \sin y

In this context, y is \frac{2\pi n k}{N}.

Example: Let x be a sequence (1, 2, 3, 4) and N = 4, and we wish to convert it to the frequency domain.

\begin{array}{lcl}F_0 & = & 1\cdot e^{-\frac{i2\pi 0 \cdot 0}{4}} + 2\cdot e^{-\frac{i2\pi 0 \cdot 1}{4}} + 3\cdot e^{-\frac{i2\pi 0 \cdot 2}{4}} + 4\cdot e^{-\frac{i2\pi 0 \cdot 3}{4}} \\ & = & 1\cdot e^0 + 2\cdot e^0 + 3\cdot e^0 + 4\cdot e^0 \\ & = & 10\end{array}

\begin{array}{lcl}F_1 & = & 1\cdot e^{-\frac{i2\pi 1 \cdot 0}{4}} + 2\cdot e^{-\frac{i2\pi 1 \cdot 1}{4}} + 3\cdot e^{-\frac{i2\pi 1 \cdot 2}{4}} + 4\cdot e^{-\frac{i2\pi 1 \cdot 3}{4}} \\ & = & 1\cdot e^0 + 2\cdot e^{-\frac{i\pi}{2}} + 3\cdot e^{-i\pi} + 4\cdot e^{-\frac{i3\pi}{2}} \\ & = & 1 + 2\left(\cos \frac{-\pi}{2} + i\sin \frac{-\pi}{2} \right) + 3 \left( \cos(-\pi) + i \sin(-\pi) \right) + 4 \left( \cos\frac{-3\pi}{2} + i\sin\frac{-3\pi}{2}\right) \\ & = & 1 + 2\left(0 - i\right) + 3 \left( -1 + 0 \right) + 4 \left( 0 + i\right) \\ & = & -2+2i\end{array}

F_2 = -2

F_3 = -2-2i

Now, if you take the values of F and pass through the IDFT, it should return you back the sequence x, though I will not show it here.


The C++ program below simply runs the same sequence in the example through DFT, and pass the result through IDFT to get back the original sequence. I placed more emphasis on readability of the code, so it is not designed to be optimized but rather to prove the correctness of the mathematical formulations of DFT and IDFT.

#include <iostream>     
#include <complex>      
#include <cmath>
#include <iomanip>
#include <vector>

using namespace std;

double PI = acos(0) * 2;
typedef complex<double> xd;
typedef vector<double> dvec;
typedef vector<xd> xvec;
const xd J(0, 1); // sqrt(-1)

xvec dft(const dvec &input)
    double N = input.size();
    xvec X(N);

    for (double k = 0; k < N; ++k) {
        for (double n = 0; n < N; ++n) {
            X[k] += (double)input[n] * exp(-2. * J * PI * n * k / N);

    return X;

dvec idft(const xvec &input)
    double N = input.size();
    xvec x(N);
    dvec out(N);

    for (double k = 0; k < N; ++k) {
        for (double n = 0; n < N; ++n) {
            x[k] += input[n] * exp(2. * J * PI * n * k / N);
        out[k] = x[k].real() / N;

    return out;

int main()
    cout << fixed << setprecision(2);

    dvec input = { 1, 2, 3, 4 };
    // convert from time to frequency domain
    xvec freqdom = dft(input);

    for (const auto &f : freqdom) {
        cout << f << endl;
    cout << endl;
    // convert from frequency to time domain
    dvec timedom = idft(freqdom);
    for (const auto &t : timedom) {
        cout << t << ' ';
    cout << endl;

The output of this program will then be:


1.00 2.00 3.00 4.00

With DFT and IDFT working properly we now use it to implement convolution:

dvec convolve(const dvec &a, const dvec &b) 
    // calculate degree of resulting polynomial
    size_t N = 2 * a.size() - 1;
    // extend size and pad with 0
    dvec acof(N, 0), bcof(N, 0);
    copy(a.begin(), a.end(), acof.begin());
    copy(b.begin(), b.end(), bcof.begin());
    xvec apv, bpv, cpv(N);
    // evaluation
    apv = dft(acof);
    bpv = dft(bcof);
    // point-wise multiplcation
    for (size_t i = 0; i < N; ++i) {
        cpv[i] = apv[i] * bpv[i];
    // interpolation
    return idft(cpv);

See the full working DFT convolution program from this gist.


So, we covered DFT and vector convolution, and applied it to solve polynomial multiplication. What’s the problem now? Pointwise multiplication, in addition with evaluation and interpolation using DFT and IDFT, uses a time complexity of O(N) + 2 \times O(N^2) \approx O(N^2). That’s actually no different, if not worst than using the naive method. To circumvent this, I will introduce FFT in the next post, which will compute the evaluation and interpolation in O(N \log N) time. Stay tuned!

Update: Part 2 is up!

5 thoughts on “The Programmer’s Guide To FFT – Part 1: DFT

  1. Pingback: The Programmer’s Guide To FFT – Part 2: FFT | Bruceoutdoors Blog of Blots

  2. How did you get those values in the green boxes under ‘a’ and ‘b’ in the second image? The first row is obviously obtained by evaluating the polynomials at x = 1, but what about the other rows?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.