Matrix Chain Multiplication with C++ Code – PART 1: Analysis and Design

Ah. The matrix chain multiplication problem. A classic dynamic programming example in the undergraduate computer science world. I will be doing a series consisting of 3 parts:

1. Analysis and Design – analyze the problem and deduce a solution.
2. Implementation – implement the algorithm to find the minimum number of operations.
3. Extracting the Sequence – extract the sequence in which the chain is multiplied.

There’s practically a ton of articles out there that already writes about this problem. I intend to distinguish myself by expressing it in the simplest terms possible with examples, and season it with working C++11 source code. In other words, if you had problem understanding everywhere else, my hopes is that your search ends with this series.

In this first part, there will be no code. At all. Not even pseudocode. I promise some C++11 code in the coming 2 parts, but understanding this section will be important for implementation later.

Prerequisites

I will assume you should have a bit of background in matrix math. It helps that you do a bit of refreshing if you have forgotten.

I will also assume you have known a bit of what dynamic programming is, as matrix chain multiplication is a slightly more complex example; you keep track of a 2D array instead of 1D array. I’ve written a beginner level hands on introduction to dynamic programming in a form of solving a problem which you can check out: 445A – Boredom – CodeForces Tutorial.

Understanding the Problem

Given a chain of matrices, we want to multiply them. We cannot change the order of the matrices as that would change the result or make it incompatible, therefore we say matrix multiplication is not commutative. In addition, regardless of how you parenthesize (e.g. whether the sequence as multiplied as $A\cdot \left ( B\cdot C \right )$ or $\left ( A\cdot B \right )\cdot C$ ) the chain it will give you the same answer, thus we say it is associative.

How we parenthesize the matrices also determine how many operations will be performed. For a multiplication chain of integers, parenthesizing them does not affect the number of operations performed, but for matrices the change is significant. For example, given the matrices with the following dimensions:

If we define a row of a matrix as $r_A$ and the column as $c_A$, then the total number of multiplications required to multiply A and B is $r_A \cdot c_A \cdot c_{B}$.

• $A\cdot \left ( B\cdot C \right )$ -> 2*20*1 + 2*1*10 = 60 multiplications
• $\left ( A\cdot B \right )\cdot C$ -> 20*1*10 + 2*20*10 = 600 multiplications

The 1st case is 10 times faster!

Therefore, we want to figure out how to parenthesize the matrix chain such that we minimize the required number of operations.

However, to keep things simple, we will focus on the minimum number of operations. We will eventually find the sequence itself by building on top on this.

Formulation

To put in formal terms, given a sequence of n matrices $A_1, A_2, A_3 \ldots A_n$ we wish to parenthesize the product of this sequence so as to minimize the number of operations required.

Finding the Minimum Number of Operations

Suppose we have a function B(i, j) that computes the minimum number of required operations for multiplying a chain of matrices from matrix i to matrix j. The solution to the main problem would be B(1, n).

A standard technique to in solving a dynamic programming problem is usually to ask, “What is the last operation?”. Here, we ask what is the minimum total cost of the last operation? Regardless how the chain is parenthesized, there will always be 2 matrices remaining. The cost, is the minimum cost of some prefix + minimum cost of some suffix + the number of operations to multiply them. The cost of prefix and suffix in turn, is derived from the minimum number of operations that deduces them.

Because each matrix can only multiply with its adjacent matrix, a prefix can only start from $A_1$ to some matrix $A_k$, and a suffix can only start from $A_{k+1}$ to $A_n$, split at some index k. The cost in the last operation for the prefix is then B(1, k) and the cost of the suffix is B(k+1, n)k therefore ranges from 1 until n-1 in the last operation, and within a range i to j, k ranges from i to j-1. We choose k by selecting the minimum cost of all the possible costs within a range i to j.

The resultant dimensions from multiplying 2 matrices are important to finding the cost. In a sequence of matrices $A_i \ldots A_j$ If we split at a point k, the resultant dimensions of the prefix is $r_i \times c_k$ and the suffix is $r_{k+1} \times c_j$. The cost of multiplying these 2 matrices are therefore $r_i \cdot c_k \cdot c_j$.

Running Through Possibilities

Let us consider the base case: When there is only 1 matrix. The prefix = suffix, and there are no operations performed, so the cost = 0.

Now what if there are only 2 matrices? There are no other possible combinations to compare with, so the minimum number of operations is the number of operations to multiply them. There is no cost to the prefix or suffix, because the cost for 1 matrix alone is 0.

3 matrices onwards is gets a little more challenging. When the chain has only 3 matrices, you have to choose between $\left ( A_1\cdot A_2 \right )\cdot A_3$ and $A_1\cdot \left ( A_2\cdot A_3\right )$. How we choose between the 2 is determined by cost of either multiplying $A_1$ and $A_2$ first, then take the resulting matrix (prefix) and multiply by $A_3$ (suffix), OR, multiplying $A_2$ and $A_3$ first, then multiplying that (suffix) by $A_1$ (prefix). Where we decide to make that split, either at $A_1$ or $A_2$, we determined by their cost. The matrix in which we choose we call it $A_k$, at a point k.

The minimum cost with 3 matrices, B(1, 3) is therefore $B(1, k) + B(k+1, 3) + r_1 \cdot c_k \cdot c_3$, and we choose between $B(1, 1) + B(2, 3) + r_1 \cdot c_1 \cdot c_3$ and $B(1, 2) + B(3, 3) + r_1 \cdot c_2 \cdot c_3$. You can continue doing this for 4 or 5 or 6 matrices until you start to see a pattern. Notice you can apply the same procedure to find B(1, 3) to find B(1, 2) – there is only 1 possible value for k for B(1, 2).

The General Solution

So in a range i to j, we select from j – i possibilities, from i until j – 1. Those possibilities in turn call B recursively until it hits the base case where i = j. The general solution is therefore:

$B(i,j) = \begin{cases} 0 \text{ if } i = j \\ \displaystyle\min_{k=i}^{j-1} \begin{cases} B(i, k) + B(k+1, j) + r_i \cdot c_k \cdot c_j \end{cases} \end{cases}$

And the solution to the main problem would be B(1, n).

Conclusion

This is all for the first part. Next post I will cover implementation: Matrix Chain Multiplication with C++ code – PART 2: Implementation.