
Introduction
Our contributions and a summary of our results
In this work, we combine several novel and existing optimizations to reduce the cost of implementing Shor's algorithms. The main hurdle to overcome is to implement one or more modular exponentiations efficiently, as these exponentiations dominate the overall cost of Shor's algorithms.
We use the standard square-and-multiply approach to reduce the exponentiations into a sequence of modular multiplications. We apply optimizations to reduce the number of multiplications and the cost of each multiplication.
The number of multiplications is reduced by using windowed arithmetic, which uses small table lookups to fuse several multiplications together. It is also reduced by using Ekerå and Håstad's derivatives of Shor's algorithms, that require fewer multiplications to be performed compared to Shor's original algorithms.
The cost of each multiplication is reduced by combining several optimizations. We use Zalka's coset representation of modular integers, which allows the use of cheaper non-modular arithmetic circuits to perform modular additions. We use oblivious carry runways to split registers into independent pieces that can be worked on in parallel when performing additions. We bound the approximation error introduced by using oblivious carry runways and the coset representation of modular integers by analyzing them as approximate encoded permutations. We use windowed arithmetic (again) to fuse multiple additions into individual lookup additions. We use a layout of the core lookup addition operation where carry propagation is limited by the reaction time of the classical control system, and where the lookups are nearly reaction limited. Finally, we optimize the overall computation by trying many possible parameter sets (e.g. window sizes and code distances) for each problem size and selecting the best parameter set.
We estimate the approximate cost of our construction, both in the abstract circuit model, and in terms of its runtime and physical qubit usage in an error corrected implementation under plausible physical assumptions for large-scale superconducting qubit platforms with nearest neighbor connectivity (see Figure 1). Our physical estimates are based on the surface code. We assume distance-d logical qubits are stored using square patches of physical qubits (see Figure 8) and are operated on via lattice surgery. We provide concrete cost estimates for several cryptographically relevant problems, such as the RSA integer factoring problem, and various parameterizations of the DLP in finite fields. These cost estimates may be used to inform decisions on when to mandate migration from currently deployed vulnerable cryptosystems to post-quantum secure systems or hybrid systems.
We caution the reader that, although we report two significant figures in our tables, there are large systemic uncertainties in our estimates. For example, doubling (or halving) the physical gate error rate would increase (or decrease) the number of qubits required by more than 10%. The estimates presented are intended to be ballpark figures that the cryptographic community can use to understand the potential impact of quantum computers; not exacting predictions of the future.
Compared to previous works, we reduce the Toffoli count when factoring RSA integers by over 10x (see Table 1). To only compare the Toffoli counts as in Table 1 may prove misleading, however, as it ignores the cost of routing, the benefits of parallelization, etc. Ideally, we would like to compare our runtime and physical qubit usage to previous works in the literature. However, this is only possible when such estimates are actually reported and use physical assumptions similar to our own. The number of works for which this requirement is met is limited.
The works by Jones et al., Fowler et al., and Gheorgiu et al. stand out in that they use the same basic cost model as we use in this paper, enabling reasonable comparisons to be made. We improve on their estimates by over 100x (see Table 2) when accounting for slight remaining differences in the cost model. We also include other historical cost estimates, although in these cases the differences in physical assumptions are substantial.
Figure 1: Log-log plot of estimated space and expected-time costs, using our parallel construction, for various problems and problem sizes. See Section 3 for additional details. The jumps in space around occur as the algorithm exceeds the error budget of the CCZ factory and switches to the T factory. Generated by ancillary file "estimate costs.py".
Abstract Qubits | Measurement Depth | Toffoli + T /2 Count | Toffoli + T /2 Count (billions) | Min Volume (megaqubitdays) | |||||
---|---|---|---|---|---|---|---|---|---|
Factoring RSA integers | Asympototic | n = 1024 | n = 2048 | n = 3072 | n = 1024 | n = 2048 | n = 3072 | ||
Vedral et al. 1996 | 7n+1 | 80n3 + O(n2) | 80n3 + O(n2) | 86 | 690 | 2300 | 240 | 4100 | 23000 |
Zalka 1998 (basic) | 3n + O(1) | 12n3 + O(n) | 12n3 + O(n2) | 13 | 100 | 350 | 16 | 250 | 1400 |
Zalka 1998 (log add) | 5n + O(1) | 600n2 + O(n) | 52n3 + O(n2) | 56 | 450 | 1500 | 16 | 160 | 540 |
Zalka 1998 (fft mult) | ≈96n | ≈217n1.2 | ≈217n2 | 140 | 550 | 1200 | 62 | 260 | 710 |
Beauregard 2002 | 2n+3 | 144n3 lg n + O(n2 lg n) | 576n3 lg2n+O(n3 lg n) | 62000 | 600000 | 2200000 | 32000 | 380000 | 1700000 |
Fowler et al. 2012 | 3n + O(1) | 40n3 + O(n2) | 40n3 + O(n2) | 43 | 340 | 1200 | 53 | 850 | 4600 |
Haner et al. 2016 | 2n+2 | 52n3 + O(n2) | 64n3 lg n + O(n3) | 580 | 5200 | 19000 | 230 | 2800 | 13000 |
(ours) 2019 | 3n+0.002n lg n | 500n2 + n2 lg n | 0.3n3 + 0.0005n3 lg n | 0.4 | 2.7 | 9.9 | 0.5 | 5.9 | 21 |
Solving elliptic curve DLPs | Asympototic | n = 160 | n = 224 | n = 256 | n = 160 | n = 224 | n = 256 | ||
Roetteler et al. 2017 | 9n+ O(lg n) | 448 n3 lg n + 4090 n3 | 448n3 lg n + 4090 n3 | 30 | 84 | 130 | 13 | 52 | 83 |
Table 1: Expected costs of factoring n bit RSA integers using various constructions proposed in the literature. For comparison, we include a single construction for solving the DLP in n bit prime order elliptic curve groups with comparable classical security levels. The estimated minimum spacetime volumes assume modern surface code constructions, even for older papers.
Physical assumptions | Approach | Estimated costs | |||||||
---|---|---|---|---|---|---|---|---|---|
Historical cost estimate at n = 2048 | Physical gate error rate | Cycle time (microseconds) | Reaction time (microseconds) | Physical connectivity | Distillation strategy | Execution strategy | Physical qubits (millions) | Expected runtime (days) | Excepected vloume (megaqubitdays) |
Van Meter et al. 2009 | 0.20% | 49 | N/A | planar | 1000 + T and S | distillation limited carry lookahead | 6500 | 410 | 2600000 |
Joned et al. 2010 | 0.10% | 0.25 | N/A | planar | 7000 T | distillation limited carry lookahead | 620 | 10 | 6200 |
Fowler et al. 2012 | 0.10% | 1 | 0.1 | planar | 1200 T | reaction limited ripple carry | 1000 | 1.1 | 1100 |
O'Gorman et al. 2017 | 0.10% | 10 | 1 | arbitary | block CCZ | reaction limited ripple carry | 230 | 3.7 | 850 |
Gheoghiu et al. 2019 | 0.10% | 0.2 | 0.1 | planar | 1100 T | reaction limited ripple carry | 170 | 1 | 170 |
(ours) 2019 (1 factory) | 0.10% | 1 | 10 | planar | 1 CCZ | distillation limited ripple carry | 16 | 6 | 90 |
(ours) 2019 (1 thread) | 0.10% | 1 | 10 | planar | 14 CCZ | reaction limited ripple carry | 19 | 0.36 | 6.6 |
(ours) 2019 (1 parallel) | 0.10% | 1 | 10 | planar | 28 CCZ | reaction limited carry runways | 20 | 0.31 | 5.9 |
Table 2: Historical estimates of the expected costs of factoring bit RSA integers, and the assumptions they used. We caution the reader to carefully consider the physical assumptions of each paper when comparing their expected volumes. For example, assuming a cycle time that is 5x more optimistic will reduce the expected volume by a corresponding factor of 5. See Appendix B for details on each entry in this table.
As most previous works focus either on factoring, or on the DLP in elliptic curve groups, it is hard to find references against which to compare our cost estimates for solving the DLP in multiplicative groups of finite fields. In general, the improvements we achieve for factoring RSA integers are comparable to the improvements we achieve for solving the general DLP in finite fields. As may be seen in Figure 1, the choice of parameterization has a significant impact on the costs. For the short DLP, and the DLP in Schnorr groups, we achieve significant improvements. These are primarily due to our use of derivatives of Shor's algorithm that are optimized for these parameterizations.