Breaking RSA Using Shor's Algorithm

Our construction

Windowed arithmetic

The next optimization we use is windowed arithmetic. Specifically, we follow the modular exponentiation construction and use windowing at two levels.

First, at the level of a multiplication, we window over the controlled additions. We fuse groups of controlled additions into single lookup additions. A lookup addition is an addition where the value to add into a register is the result of a table lookup. Small windows of the qubits that would have been used as controls for the additions are instead treated as addresses into classically precomputed tables of values to unconditionally add into the target. 

Second, at the level of the exponentiation, we window over the controlled multiplications. This is done by including exponent qubits in the addresses of all lookups being performed within the multiplications. 

The cost of windowed arithmetic depends on the size of the windows. Let c_{exp} be the size of the window over exponent qubits that are being used to control multiplications. Increasing c_{exp} proportionally decreases the number of multiplications needed for the modular exponentiation, since more exponent qubits are handled by each multiplication. Let c_{mul} be the size of the window over factor qubits being used to control additions. Increasing c_{mul} proportionally decreases the number of additions needed within each multiplication. By using windowed arithmetic, the n_e controlled multiplications we needed to perform become n_e/c_{exp} uncontrolled multiplications while the 2n controlled additions we needed to perform within each multiplication become 2n/c_{mul} uncontrolled additions. The cost of increasing c_{exp} and c_{mul} is that each addition is now accompanied by a lookup, and the Toffoli count of the lookup operation is 2^{c_{exp}+c_{mul}}.

Figure 2

Figure 2: Working through the qubits representing an exponent in Shor's algorithm with a window size c_{exp} of four, while using a semi-classical Fourier transform with recycling. The notation e[a : b] = [e/2^a] \text{mod } 2^{b−a} refers to a slice of (qu)bits in e from (qu)bit index a (inclusive) to (qu)bit index b (exclusive). Each R_z (. . . ) gate rotates by an amount determined by previous measurements, e.g. using the technique. All 16 possible values of the expression  g^{e[4:8]2^4} \text{(mod N)} (and similar expressions) can be precomputed classically, and looked up on demand within the multiplication circuit. This reduces the number of multiplications by a factor of the window size, at the cost of some additional lookup work within the multiplication.

Using Cuccaro et al.'s adder, each n-bit addition has a Toffoli count and measurement depth of 2n. Using Babbush et al.'s QROM read, each table lookup has a Toffoli count and measurement depth of 2^{c_{mul}+c_{exp}} and uses a negligible O(c_{mul} + c_{exp}) ancillae. Using measurement-based uncomputation, uncomputing each table lookup has a Toffoli count and measurement depth of 2 \sqrt {2c_{mul}+c_{exp}} which is negligible compared to the lookup for the parameters we will be using. 

The measurement-based uncomputation uses \text{max}(0, \sqrt{2c_{mul}+c_{exp}} −n) ancillae (it starts by measuring away n logical qubits of the lookup output register). This ancillae count is always zero for the reasonable values of c_{mul}, c_{exp}, and classically intractable values of n, that we consider in this paper. The overhead due to the logarithmic padding introduced by the coset representation of modular integers is logarithmic in n

Thus, by using windowed arithmetic, the leading term of the Toffoli count of the reference implementation has been reduced to \dfrac{2n_en}{
c_{mul}c_{exp}} (2n + 2^{c_{mul}+c_{exp}}). If we set c_{mul} = c_{exp} = \dfrac{1}{2} lg \; n then we achieve a Toffoli count with leading term \dfrac{24n_en^2}{lg^2 \; n}. In terms of space, we are using 3n + O(lg \; n)
logical qubits. The qubits are distributed as follows: n + O(lg \; n) for the accumulation register storing the product, n + O(lg \; n) for a workspace register during the multiplication, n + O(lg \; n) for the lookup output register, and O(lg \; n) qubits to hold the part of the exponent needed for the current stage of the semi-classical Fourier transform.