Breaking RSA Using Shor's Algorithm
Our construction
Construction summary
We now review the basic flow of our implementation of Shor's algorithm, including some details we would have otherwise left unstated.
The quantum part of the algorithm starts by preparing two registers, storing zero and one respectively, into the coset representation of modular integers with oblivious carry runways. The cost of this preparation is negligible compared to the cost of the rest of the algorithm. The register storing one is the accumulation register that will ultimately store the modular exponentiation, while the zero register is a workspace register.

Figure 8: Stabilizer configuration diagrams for lattice surgery qubits in the rotated surface code with code
distances d of (from left to right) 3, 5, and 11. Each small yellow circle is a data qubit. Each filled shape is
a stabilizer that is measured each surface code cycle. Black squares are four body Z stabilizers, gray squares
are four body X stabilizers, black semi-circles are two body Z stabilizers, and gray semi-circles are two body
X stabilizers. Not all physical qubits are shown; in particular there is a measurement qubit present for each
stabilizer. There are physical qubits per logical qubit, including the spacing between logical qubits.
We assume that each increase of the code distance by 2 suppresses logical errors per surface code cycle by a
factor of 10 and that this is achieved with a physical error rate of 10−3.
The bulk of the execution time is then spent performing the modular exponentiation(s). Ex- ponent qubits are iteratively introduced in groups of size cexp. We are performing a semi-classical Fourier transform, so the exponent qubits must be phased according to measurements on previous exponent qubits. Performing the phasing operations requires using T factories instead of CCZ factories, but there is so little phasing work compared to CCZ work that we ignore this cost in our estimates. The phased exponent qubits are used as address qubits during a windowed modular multiplication, then measured in the frequency basis and discarded. See Figure 2. Within this step, almost all of the computation time is spent on lookup additions.
After all of the modular multiplications have been completed, the accumulation register is storing the result of the modular exponentiation. Note that this register still has oblivious carry runways and is still using the coset representation of modular integers. We could decode the register before measuring it but, because the decoding operations are all classical, it is more efficient to simply measure the entire register and perform the decoding classically. As with the cost of initialization of the registers, this cost is completely negligible.
This completes the quantum part of the algorithm. The exponent qubit measurement results, and the decoded accumulator measurement result, are fed into the classical post-processing code which uses them to derive the solution. If this fails, which can occur e.g. due to a distillation error during quantum execution, the algorithm is restarted.
Note that there are many minor implementation details that we have not discussed. For ex-
ample, during a windowed multiplication, one of the registers is essentially sitting idle except that
its qubits are being used (cmul at a time) as address qubits for the lookup additions. We have not
discussed how these qubits are routed into and out of the lookup computation as they are needed.
It is simply clear that there is enough leeway, in the packing of the computation into spacetime,
for this routing task to be feasible and contribute negligible cost.