Matrix Chain Multiplication with C++ Code – PART 1: Analysis and Design

Ah. The matrix chain multiplication problem. A classic dynamic programming example in the undergraduate computer science world. I will be doing a series consisting of 3 parts:

  1. Analysis and Design – analyze the problem and deduce a solution.
  2. Implementation – implement the algorithm to find the minimum number of operations.
  3. Extracting the Sequence – extract the sequence in which the chain is multiplied.

There’s practically a ton of articles out there that already writes about this problem. I intend to distinguish myself by expressing it in the simplest terms possible with examples, and season it with working C++11 source code. In other words, if you had problem understanding everywhere else, my hopes is that your search ends with this series.

In this first part, there will be no code. At all. Not even pseudocode. I promise some C++11 code in the coming 2 parts, but understanding this section will be important for implementation later.

Prerequisites

I will assume you should have a bit of background in matrix math. It helps that you do a bit of refreshing if you have forgotten.

I will also assume you have known a bit of what dynamic programming is, as matrix chain multiplication is a slightly more complex example; you keep track of a 2D array instead of 1D array. I’ve written a beginner level hands on introduction to dynamic programming in a form of solving a problem which you can check out: 445A – Boredom – CodeForces Tutorial.

Understanding the Problem

Given a chain of matrices, we want to multiply them. We cannot change the order of the matrices as that would change the result or make it incompatible, therefore we say matrix multiplication is not commutative. In addition, regardless of how you parenthesize (e.g. whether the sequence as multiplied as A\cdot \left ( B\cdot C \right ) or \left ( A\cdot B \right )\cdot C ) the chain it will give you the same answer, thus we say it is associative.

How we parenthesize the matrices also determine how many operations will be performed. For a multiplication chain of integers, parenthesizing them does not affect the number of operations performed, but for matrices the change is significant. For example, given the matrices with the following dimensions:

matrices-abc

If we define a row of a matrix as r_A and the column as c_A, then the total number of multiplications required to multiply A and B is r_A \cdot c_A \cdot c_{B}.

  • A\cdot \left ( B\cdot C \right ) -> 2*20*1 + 2*1*10 = 60 multiplications
  • \left ( A\cdot B \right )\cdot C -> 20*1*10 + 2*20*10 = 600 multiplications

The 1st case is 10 times faster!

NOTE: There is a neat coding exercise you can try out on this total multiplications thing on UVa.

Therefore, we want to figure out how to parenthesize the matrix chain such that we minimize the required number of operations.

However, to keep things simple, we will focus on the minimum number of operations. We will eventually find the sequence itself by building on top on this.

Formulation

To put in formal terms, given a sequence of n matrices A_1, A_2, A_3 \ldots A_n we wish to parenthesize the product of this sequence so as to minimize the number of operations required.

Finding the Minimum Number of Operations

Suppose we have a function B(i, j) that computes the minimum number of required operations for multiplying a chain of matrices from matrix i to matrix j. The solution to the main problem would be B(1, n).

A standard technique to in solving a dynamic programming problem is usually to ask, “What is the last operation?”. Here, we ask what is the minimum total cost of the last operation? Regardless how the chain is parenthesized, there will always be 2 matrices remaining. The cost, is the minimum cost of some prefix + minimum cost of some suffix + the number of operations to multiply them. The cost of prefix and suffix in turn, is derived from the minimum number of operations that deduces them.

Because each matrix can only multiply with its adjacent matrix, a prefix can only start from A_1 to some matrix A_k, and a suffix can only start from A_{k+1} to A_n, split at some index k. The cost in the last operation for the prefix is then B(1, k) and the cost of the suffix is B(k+1, n)k therefore ranges from 1 until n-1 in the last operation, and within a range i to j, k ranges from i to j-1. We choose k by selecting the minimum cost of all the possible costs within a range i to j.

The resultant dimensions from multiplying 2 matrices are important to finding the cost. In a sequence of matrices A_i \ldots A_j If we split at a point k, the resultant dimensions of the prefix is r_i \times c_k and the suffix is r_{k+1} \times c_j. The cost of multiplying these 2 matrices are therefore r_i \cdot c_k \cdot c_j.

matrix-prefix-suffix

Running Through Possibilities

Let us consider the base case: When there is only 1 matrix. The prefix = suffix, and there are no operations performed, so the cost = 0.

Now what if there are only 2 matrices? There are no other possible combinations to compare with, so the minimum number of operations is the number of operations to multiply them. There is no cost to the prefix or suffix, because the cost for 1 matrix alone is 0.

3 matrices onwards is gets a little more challenging. When the chain has only 3 matrices, you have to choose between \left ( A_1\cdot A_2 \right )\cdot A_3 and A_1\cdot \left ( A_2\cdot A_3\right ). How we choose between the 2 is determined by cost of either multiplying A_1 and A_2 first, then take the resulting matrix (prefix) and multiply by A_3 (suffix), OR, multiplying A_2 and A_3 first, then multiplying that (suffix) by A_1 (prefix). Where we decide to make that split, either at A_1 or A_2, we determined by their cost. The matrix in which we choose we call it A_k, at a point k.

The minimum cost with 3 matrices, B(1, 3) is therefore B(1, k) + B(k+1, 3) + r_1 \cdot c_k \cdot c_3, and we choose between B(1, 1) + B(2, 3) + r_1 \cdot c_1 \cdot c_3 and B(1, 2) + B(3, 3) + r_1 \cdot c_2 \cdot c_3. You can continue doing this for 4 or 5 or 6 matrices until you start to see a pattern. Notice you can apply the same procedure to find B(1, 3) to find B(1, 2) – there is only 1 possible value for k for B(1, 2).

The General Solution

So in a range i to j, we select from j – i possibilities, from i until j – 1. Those possibilities in turn call B recursively until it hits the base case where i = j. The general solution is therefore:

B(i,j) = \begin{cases} 0 \text{ if } i = j \\ \displaystyle\min_{k=i}^{j-1} \begin{cases} B(i, k) + B(k+1, j) + r_i \cdot c_k \cdot c_j \end{cases} \end{cases}

And the solution to the main problem would be B(1, n).

Conclusion

This is all for the first part. Next post I will cover implementation: Matrix Chain Multiplication with C++ code – PART 2: Implementation.

Reference

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s