Ah. The matrix chain multiplication problem. A classic dynamic programming example in the undergraduate computer science world. I will be doing a series consisting of 3 parts:
- Analysis and Design – analyze the problem and deduce a solution.
- Implementation – implement the algorithm to find the minimum number of operations.
- Extracting the Sequence – extract the sequence in which the chain is multiplied.
There’s practically a ton of articles out there that already writes about this problem. I intend to distinguish myself by expressing it in the simplest terms possible with examples, and season it with working C++11 source code. In other words, if you had problem understanding everywhere else, my hopes is that your search ends with this series.
In this first part, there will be no code. At all. Not even pseudocode. I promise some C++11 code in the coming 2 parts, but understanding this section will be important for implementation later.
I will assume you should have a bit of background in matrix math. It helps that you do a bit of refreshing if you have forgotten.
I will also assume you have known a bit of what dynamic programming is, as matrix chain multiplication is a slightly more complex example; you keep track of a 2D array instead of 1D array. I’ve written a beginner level hands on introduction to dynamic programming in a form of solving a problem which you can check out: 445A – Boredom – CodeForces Tutorial.
Understanding the Problem
Given a chain of matrices, we want to multiply them. We cannot change the order of the matrices as that would change the result or make it incompatible, therefore we say matrix multiplication is not commutative. In addition, regardless of how you parenthesize (e.g. whether the sequence as multiplied as or ) the chain it will give you the same answer, thus we say it is associative.
How we parenthesize the matrices also determine how many operations will be performed. For a multiplication chain of integers, parenthesizing them does not affect the number of operations performed, but for matrices the change is significant. For example, given the matrices with the following dimensions:
If we define a row of a matrix as and the column as , then the total number of multiplications required to multiply A and B is .
- -> 2*20*1 + 2*1*10 = 60 multiplications
- -> 20*1*10 + 2*20*10 = 600 multiplications
The 1st case is 10 times faster!
Therefore, we want to figure out how to parenthesize the matrix chain such that we minimize the required number of operations.
However, to keep things simple, we will focus on the minimum number of operations. We will eventually find the sequence itself by building on top on this.
To put in formal terms, given a sequence of n matrices we wish to parenthesize the product of this sequence so as to minimize the number of operations required.
Finding the Minimum Number of Operations
Suppose we have a function B(i, j) that computes the minimum number of required operations for multiplying a chain of matrices from matrix i to matrix j. The solution to the main problem would be B(1, n).
A standard technique to in solving a dynamic programming problem is usually to ask, “What is the last operation?”. Here, we ask what is the minimum total cost of the last operation? Regardless how the chain is parenthesized, there will always be 2 matrices remaining. The cost, is the minimum cost of some prefix + minimum cost of some suffix + the number of operations to multiply them. The cost of prefix and suffix in turn, is derived from the minimum number of operations that deduces them.
Because each matrix can only multiply with its adjacent matrix, a prefix can only start from to some matrix , and a suffix can only start from to , split at some index k. The cost in the last operation for the prefix is then B(1, k) and the cost of the suffix is B(k+1, n). k therefore ranges from 1 until n-1 in the last operation, and within a range i to j, k ranges from i to j-1. We choose k by selecting the minimum cost of all the possible costs within a range i to j.
The resultant dimensions from multiplying 2 matrices are important to finding the cost. In a sequence of matrices If we split at a point k, the resultant dimensions of the prefix is and the suffix is . The cost of multiplying these 2 matrices are therefore .
Running Through Possibilities
Let us consider the base case: When there is only 1 matrix. The prefix = suffix, and there are no operations performed, so the cost = 0.
Now what if there are only 2 matrices? There are no other possible combinations to compare with, so the minimum number of operations is the number of operations to multiply them. There is no cost to the prefix or suffix, because the cost for 1 matrix alone is 0.
3 matrices onwards is gets a little more challenging. When the chain has only 3 matrices, you have to choose between and . How we choose between the 2 is determined by cost of either multiplying and first, then take the resulting matrix (prefix) and multiply by (suffix), OR, multiplying and first, then multiplying that (suffix) by (prefix). Where we decide to make that split, either at or , we determined by their cost. The matrix in which we choose we call it , at a point k.
The minimum cost with 3 matrices, B(1, 3) is therefore , and we choose between and . You can continue doing this for 4 or 5 or 6 matrices until you start to see a pattern. Notice you can apply the same procedure to find B(1, 3) to find B(1, 2) – there is only 1 possible value for k for B(1, 2).
The General Solution
So in a range i to j, we select from j – i possibilities, from i until j – 1. Those possibilities in turn call B recursively until it hits the base case where i = j. The general solution is therefore:
And the solution to the main problem would be B(1, n).
This is all for the first part. Next post I will cover implementation: Matrix Chain Multiplication with C++ code – PART 2: Implementation.
- Algorithms – Lecture 10: Dynamic Programming, Matrix Chain Multiplication and Typesetting (video lecture by Abhi Shelat of the University of Virginia)
- Matrix-chain Multiplication Problem (www.personal.kent.edu/)