Breaking RSA Using Shor's Algorithm

Introduction

Our contributions and a summary of our results

In this work, we combine several novel and existing optimizations to reduce the cost of implementing Shor's algorithms. The main hurdle to overcome is to implement one or more modular exponentiations efficiently, as these exponentiations dominate the overall cost of Shor's algorithms. 

We use the standard square-and-multiply approach to reduce the exponentiations into a sequence of modular multiplications. We apply optimizations to reduce the number of multiplications and the cost of each multiplication.

The number of multiplications is reduced by using windowed arithmetic, which uses small table lookups to fuse several multiplications together. It is also reduced by using Ekerå  and Håstad's derivatives  of Shor's algorithms, that require fewer multiplications to be performed compared to Shor's original algorithms.

The cost of each multiplication is reduced by combining several optimizations. We use Zalka's coset representation of modular integers, which allows the use of cheaper non-modular arithmetic circuits to perform modular additions. We use oblivious carry runways to split registers into independent pieces that can be worked on in parallel when performing additions. We bound the approximation error introduced by using oblivious carry runways and the coset representation of modular integers by analyzing them as approximate encoded permutations. We use windowed arithmetic (again) to fuse multiple additions into individual lookup additions. We use a layout of the core lookup addition operation where carry propagation is limited by the reaction time of the classical control system, and where the lookups are nearly reaction limited. Finally, we optimize the overall computation by trying many possible parameter sets (e.g. window sizes and code distances) for each problem size and selecting the best parameter set.

We estimate the approximate cost of our construction, both in the abstract circuit model, and in terms of its runtime and physical qubit usage in an error corrected implementation under plausible physical assumptions for large-scale superconducting qubit platforms with nearest neighbor connectivity (see Figure 1). Our physical estimates are based on the surface code. We assume distance-d logical qubits are stored using square patches of 2(d + 1)^2 physical qubits (see Figure 8) and are operated on via lattice surgery. We provide concrete cost estimates for several cryptographically relevant problems, such as the RSA integer factoring problem, and various parameterizations of the DLP in finite fields. These cost estimates may be used to inform decisions on when to mandate migration from currently deployed vulnerable cryptosystems to post-quantum secure systems or hybrid systems.

We caution the reader that, although we report two significant figures in our tables, there are large systemic uncertainties in our estimates. For example, doubling (or halving) the physical gate error rate would increase (or decrease) the number of qubits required by more than 10%. The estimates presented are intended to be ballpark figures that the cryptographic community can use to understand the potential impact of quantum computers; not exacting predictions of the future.

Compared to previous works, we reduce the Toffoli count when factoring RSA integers by over 10x (see Table 1). To only compare the Toffoli counts as in Table 1 may prove misleading, however, as it ignores the cost of routing, the benefits of parallelization, etc. Ideally, we would like to compare our runtime and physical qubit usage to previous works in the literature. However, this is only possible when such estimates are actually reported and use physical assumptions similar to our own. The number of works for which this requirement is met is limited.

The works by Jones et al., Fowler et al., and Gheorgiu et al. stand out in that they use the same basic cost model as we use in this paper, enabling reasonable comparisons to be made. We improve on their estimates by over 100x (see Table 2) when accounting for slight remaining differences in the cost model. We also include other historical cost estimates, although in these cases the differences in physical assumptions are substantial.

Figure 1: Log-log plot of estimated space and expected-time costs, using our parallel construction, for various problems and

Figure 1: Log-log plot of estimated space and expected-time costs, using our parallel construction, for various problems and problem sizes. See Section 3 for additional details. The jumps in space around n = 32786 occur as the algorithm exceeds the error budget of the CCZ factory and switches to the T factory. Generated by ancillary file "estimate costs.py".

Abstract Qubits Measurement Depth Toffoli + T /2 Count Toffoli + T /2 Count (billions) Min Volume (megaqubitdays)
Factoring RSA integers Asympototic n = 1024 n = 2048 n = 3072 n = 1024 n = 2048 n = 3072
Vedral et al. 1996 7n+1 80n3 + O(n2) 80n3 + O(n2) 86 690 2300 240 4100 23000
Zalka 1998 (basic) 3n + O(1) 12n3 + O(n) 12n3 + O(n2) 13 100 350 16 250 1400
Zalka 1998 (log add) 5n + O(1) 600n2 + O(n) 52n3 + O(n2) 56 450 1500 16 160 540
Zalka 1998 (fft mult) ≈96n ≈217n1.2 ≈217n2 140 550 1200 62 260 710
Beauregard 2002 2n+3 144n3 lg n + O(n2 lg n) 576n3 lg2n+O(n3 lg n) 62000 600000 2200000 32000 380000 1700000
Fowler et al. 2012 3n + O(1) 40n3 + O(n2) 40n3 + O(n2) 43 340 1200 53 850 4600
Haner et al. 2016 2n+2 52n3 + O(n2) 64n3 lg n + O(n3) 580 5200 19000 230 2800 13000
(ours) 2019  3n+0.002n lg n 500n2 + n2 lg n 0.3n3 + 0.0005n3 lg n 0.4 2.7 9.9 0.5 5.9 21
Solving elliptic curve DLPs Asympototic n = 160 n = 224 n = 256 n = 160 n = 224 n = 256
Roetteler et al. 2017 9n+ O(lg n) 448 n3 lg n + 4090 n3 448n3 lg n + 4090 n3 30 84 130 13 52 83

Table 1: Expected costs of factoring n bit RSA integers using various constructions proposed in the literature. For comparison, we include a single construction for solving the DLP in n bit prime order elliptic curve groups with comparable classical security levels. The estimated minimum spacetime volumes assume modern surface code constructions, even for older papers.


Physical assumptions Approach Estimated costs
Historical cost estimate at n = 2048 Physical gate error rate Cycle time (microseconds) Reaction time (microseconds) Physical connectivity Distillation strategy Execution strategy Physical qubits (millions) Expected runtime (days) Excepected vloume (megaqubitdays)
Van Meter et al. 2009 0.20% 49 N/A planar 1000 +  T and S distillation limited carry lookahead 6500 410 2600000
Joned et al. 2010 0.10% 0.25 N/A planar 7000 T distillation limited carry lookahead 620 10 6200
Fowler et al. 2012 0.10% 1 0.1 planar 1200 T reaction limited ripple carry 1000 1.1 1100
O'Gorman et al. 2017 0.10% 10 1 arbitary block CCZ reaction limited ripple carry 230 3.7 850
Gheoghiu et al. 2019 0.10% 0.2 0.1 planar 1100 T reaction limited ripple carry 170 1 170
(ours) 2019 (1 factory) 0.10% 1 10 planar 1 CCZ distillation limited ripple carry 16 6 90
(ours) 2019 (1 thread) 0.10% 1 10 planar 14 CCZ reaction limited ripple carry 19 0.36 6.6
(ours) 2019 (1 parallel) 0.10% 1 10 planar 28 CCZ reaction limited carry runways 20 0.31 5.9

Table 2: Historical estimates of the expected costs of factoring n = 2048 bit RSA integers, and the assumptions they used. We caution the reader to carefully consider the physical assumptions of each paper when comparing their expected volumes. For example, assuming a cycle time that is 5x more optimistic will reduce the expected volume by a corresponding factor of 5. See Appendix B for details on each entry in this table.

As most previous works focus either on factoring, or on the DLP in elliptic curve groups, it is hard to find references against which to compare our cost estimates for solving the DLP in multiplicative groups of finite fields. In general, the improvements we achieve for factoring RSA integers are comparable to the improvements we achieve for solving the general DLP in finite fields. As may be seen in Figure 1, the choice of parameterization has a significant impact on the costs. For the short DLP, and the DLP in Schnorr groups, we achieve significant improvements. These are primarily due to our use of derivatives of Shor's algorithm that are optimized for these parameterizations.