Breaking RSA Using Shor's Algorithm
Our construction
Windowed arithmetic
The next optimization we use is windowed arithmetic. Specifically, we follow the modular exponentiation construction and use windowing at two levels.
First, at the level of a multiplication, we window over the controlled additions. We fuse groups of controlled additions into single lookup additions. A lookup addition is an addition where the value to add into a register is the result of a table lookup. Small windows of the qubits that would have been used as controls for the additions are instead treated as addresses into classically precomputed tables of values to unconditionally add into the target.
Second, at the level of the exponentiation, we window over the controlled multiplications. This is done by including exponent qubits in the addresses of all lookups being performed within the multiplications.
The cost of windowed arithmetic depends on the size of the windows. Let be the size of the
window over exponent qubits that are being used to control multiplications. Increasing
proportionally decreases the number of multiplications needed for the modular exponentiation, since
more exponent qubits are handled by each multiplication. Let
be the size of the window over factor qubits being used to control additions. Increasing
proportionally decreases the number
of additions needed within each multiplication. By using windowed arithmetic, the
controlled
multiplications we needed to perform become
uncontrolled multiplications while the
controlled additions we needed to perform within each multiplication become
uncontrolled
additions. The cost of increasing
and
is that each addition is now accompanied by a
lookup, and the Toffoli count of the lookup operation is
.
Figure 2: Working through the qubits representing an exponent in Shor's algorithm with a window size
of four, while using a semi-classical Fourier transform with recycling. The notation
refers to a slice of (qu)bits in e from (qu)bit index a (inclusive) to (qu)bit index b (exclusive).
Each
gate rotates by an amount determined by previous measurements, e.g. using the technique. All 16 possible values of the expression
(and similar expressions) can be precomputed
classically, and looked up on demand within the multiplication circuit. This reduces the number of multiplications
by a factor of the window size, at the cost of some additional lookup work within the multiplication.
Using Cuccaro et al.'s adder, each n-bit addition has a Toffoli count and measurement
depth of . Using Babbush et al.'s QROM read, each table lookup has a
Toffoli count and measurement depth of
and uses a negligible
ancillae.
Using measurement-based uncomputation, uncomputing each table lookup has
a Toffoli count and measurement depth of
which is negligible compared to the lookup
for the parameters we will be using.
The measurement-based uncomputation uses ancillae (it starts by measuring away n logical qubits of the lookup output register). This ancillae count is always zero for
the reasonable values of
,
, and classically intractable values of
, that we consider in this
paper. The overhead due to the logarithmic padding introduced by the coset representation of
modular integers is logarithmic in
.
Thus, by using windowed arithmetic, the leading term of the Toffoli count of the reference
implementation has been reduced to . If we set
then
we achieve a Toffoli count with leading term
. In terms of space, we are using
logical qubits. The qubits are distributed as follows:
for the accumulation register
storing the product,
for a workspace register during the multiplication,
for the lookup output register, and
qubits to hold the part of the exponent needed for the
current stage of the semi-classical Fourier transform.