The Montgomery reduction algorithm finds the remainder of a division. Many cryptographic schemes work with numbers modulo a prime. When you have to multiply two numbers with e.g. 128 bits each, first you multiply them the usual way (there are many techniques for this) to obtain a 256-bit (“double precision”) number. Then you need to reduce this result modulo the prime you’re working with, that is, compute the remainder of the division of this number over the prime.
You can compute the remainder of a division with the “schoolbook” technique everyone learns in school, but that is expensive and requires divisions, with are costly in many platforms (some microcontrollers don’t even have a division instruction). Montgomery reduction only needs a division by a powers of the integer size, which are cheap for computers.
When working modulo a prime (or modulo anything), you can add or subtract the prime (call it p) from the number you’re working with (call it x) and you’ll always get the same number. For example, 3 = 3 + 7 = 10 = 3 (mod 7), or 3 = 3 – 7 = -4 = 3 (mod 7). This gives the basic approach for finding remainders: just subtract the p from x until you reach a number smaller than p.
For example, you want to compute 6 * 5 modulo 7. You have 6 * 5 = 30, now you just need the remainder of the division of 30 over 7, that is, 30 modulo 7. Now it’s just subtractions:
30 - 7 = 23 (mod 7) 23 - 7 = 16 (mod 7) 16 - 7 = 9 (mod 7) 9 - 7 = 2 (mod 7)
And you’re done: 6 * 5 = 2 (mod 7). But if the number you’re working with is big, the number of subtractions you’ll need will be too large. You can speed up this process by dividing the number by the modulus and computing r = x – (x/p * p). But as I said, division is expensive.
Let’s add, not subtract
The first insight of the Montgomery reduction is this: what if instead of subtracting, you add the prime modulus many times? And more, what if you add the modulus in a way that the number being reduced is filled with zeros to the right? For example, let’s work modulo 97. You want to compute 43 * 56 (mod 97). You multiply the operands obtaining 2408, and you’ll need to reduce it modulo 97. You can add the 97 to this number any times you want, that is, you can add any multiple of 97. Which multiple of 97 that, when added to 2408, leads to a number with last digit 0?
2408 + 6 * 97 = 2408 + 582 = 2990
It’s 6 times 97. It’s easy to find this “magic number”, but I won’t go into this right now. OK, you have a rightmost zero. Let’s try to add again in order to have two zeros:
2990 + 30 * 97 = 2990 + 2910 = 5900
Great! But… so what? Well, suppose these numbers live in a world where they must be divided by 100 after you multiply them and use the above procedure. Then you divide 5900 by 100, which is cheap, and you have 59.
Sadly, these numbers don’t live in such a world, and 59 is not the right answer. If the division is exact, you can divide by any multiple of 97, but you can’t simply divide by 100.
The Montgomery World
The second insight of the Montgomery reduction is: if you need such a world were you must divide by 100, then just create it!
The Montgomery reduction requires transforming the numbers into the “Montgomery domain”, which is exactly the world we need. First let’s look at our modulus: 97. It has two digits in base 10. Then let R be smaller power of the base that is greater than the modulus. For 97, we have R = 10^2 = 100.
To transform a number x into the Montgomery domain, compute x * R (mod p). In our example, suppose x = 35. Then 35 * 100 (mod 97) = 8 (mod 97).
(You may notice that this transformation simply begs the question – you need to reduce modulo p in order to convert the numbers to a form where it’s easier to reduce modulo p! I’ll talk about this later on.)
Now, the beauty of the Montgomery domain: suppose that you want to compute x * y (mod p). Converting the operands, you’ll actually compute (x * R) * (y * R) (mod p). But you want the result to still live in the Montgomery domain. That’s because you may need to do many other multiplications. Since converting the number back to the real world is somewhat expensive, it’s best to keep the computations in the Montgomery domain. Therefore, you want to compute x * y * R (mod p), i.e., you want x * y, but in the Montgomery domain.
Let’s see… if you just multiply the operands the usual way, you get (x * R) * (y * R) (mod p) = x * y * R * R (mod p). But you want x * y * R (mod p). Therefore, every time you multiply two numbers, you’ll need to reduce the result modulo p, but you’ll also need to divide by R. Recall our example, where R = 100, and we wanted to be required to divide by 100. That’s exactly what we got: the Montgomery domain requires the division by 100!
Let x = 43, y = 56, p = 97, R = 100. You want to compute x * y (mod p). First you convert x and y to the Montgomery domain. For x, compute x’ = x * R (mod p) = 43 * 100 (mod 97) = 32, and for y, compute y’ = y * R (mod p) = 56 * 100 (mod 97) = 71.
Compute a := x’ * y’ = 32 * 71 = 2272.
In order to zero the first digit, compute a := a + (4p) = 2272 + 388 = 2660.
In order to zero the second digit, compute a := a + (20p) = 2660 + 1940 = 4600.
Compute a := a / R = 4600 / 100 = 46.
We have that 46 is the Montgomery representation of x * y (mod p), that is, x * y * R (mod p). In order to convert it back, compute a * (1/R) (mod p) = 46 * 65 (mod 97) = 80. You can check that 43 * 56 (mod 97) is indeed 80.
Converting the numbers
Let Montgomery(x’,y’) = (x’y’)/R (mod p) be the Montgomery reduction of x’ and y’.
Like I mentioned, in order to compute the Montgomery representation of a number or in order to convert it back, you need to multiply by R modulo p or divide by R modulo p. You can compute this remainders using a classic, expensive algorithm, since for practical applications you’ll rarely need to convert the numbers (for example, in a cryptographic application: since no one will need to actually see the real form of the numbers, there is no need for conversions and it’s possible to work entirely in the Montgomery domain). But there is also another approach. You can precompute R’ = R^2 (mod p) and then convert a number x to xR (mod p) by simple computing the Montgomery reduction of x and R’, since Montgomery(x, R’) = Montgomery(x, R^2) = (xRR)/R (mod p) = xR (mod p). To convert back a number x’ = xR (mod p), simply compute the Montgomery reduction of x’ and 1, since Montgomery(x’, 1) = Montgomery(xR, 1) = (xR)/R (mod p) = x (mod p).
The magic numbers
Let B be the base you’re working with. The main step of the Montgomery reduction is “add a multiple k of p that, when added to a, makes its i-th digit zero”. It’s not hard to find k. First, notice that the only relevant digits are the first digit of p (call it p) and the i-th digit of a (call it a[i]). Then:
a[i] + k * p = 0 (mod B) k = a[i] * -(1/p) (mod B)
You can precompute -(1/p) (mod B) and then multiply it by a[i] in order to find k. In our example, with p = 97 and base 10, this value is -(1/7) (mod 10) = -3 (mod 10) = 7.
In the real world
For example, in elliptic curve cryptography in the 128-bit level of security, the operands have 256 bits. Suppose the target platform is a desktop PC with 32 bits. Then the operands are represented by eight 32-bit digits (i.e. B=2^32 as opposed to B=10 in the examples); R is 2^256; and the Montgomery reduction requires eight steps (since the numbers have eight integers).
If you look carefully, you can notice that the Montgomery reduction is nothing more than a multiplication of p with an operand which is computed on the fly (the “magic numbers”). This allows the use of many optimizations from multiplication algorithms.