Breaking RSA Using Shor's Algorithm

Introduction

Our contributions and a summary of our results

In this work, we combine several novel and existing optimizations to reduce the cost of implementing Shor's algorithms. The main hurdle to overcome is to implement one or more modular exponentiations efficiently, as these exponentiations dominate the overall cost of Shor's algorithms.

We use the standard square-and-multiply approach to reduce the exponentiations into a sequence of modular multiplications. We apply optimizations to reduce the number of multiplications and the cost of each multiplication.

The number of multiplications is reduced by using windowed arithmetic, which uses small table lookups to fuse several multiplications together. It is also reduced by using Ekerå and Håstad's derivatives of Shor's algorithms, that require fewer multiplications to be performed compared to Shor's original algorithms.

The cost of each multiplication is reduced by combining several optimizations. We use Zalka's coset representation of modular integers, which allows the use of cheaper non-modular arithmetic circuits to perform modular additions. We use oblivious carry runways to split registers into independent pieces that can be worked on in parallel when performing additions. We bound the approximation error introduced by using oblivious carry runways and the coset representation of modular integers by analyzing them as approximate encoded permutations. We use windowed arithmetic (again) to fuse multiple additions into individual lookup additions. We use a layout of the core lookup addition operation where carry propagation is limited by the reaction time of the classical control system, and where the lookups are nearly reaction limited. Finally, we optimize the overall computation by trying many possible parameter sets (e.g. window sizes and code distances) for each problem size and selecting the best parameter set.

We estimate the approximate cost of our construction, both in the abstract circuit model, and in terms of its runtime and physical qubit usage in an error corrected implementation under plausible physical assumptions for large-scale superconducting qubit platforms with nearest neighbor connectivity (see Figure 1). Our physical estimates are based on the surface code. We assume distance-d logical qubits are stored using square patches of $2(d + 1)^2$ physical qubits (see Figure 8) and are operated on via lattice surgery. We provide concrete cost estimates for several cryptographically relevant problems, such as the RSA integer factoring problem, and various parameterizations of the DLP in finite fields. These cost estimates may be used to inform decisions on when to mandate migration from currently deployed vulnerable cryptosystems to post-quantum secure systems or hybrid systems.

We caution the reader that, although we report two significant figures in our tables, there are large systemic uncertainties in our estimates. For example, doubling (or halving) the physical gate error rate would increase (or decrease) the number of qubits required by more than 10%. The estimates presented are intended to be ballpark figures that the cryptographic community can use to understand the potential impact of quantum computers; not exacting predictions of the future.

Compared to previous works, we reduce the Toffoli count when factoring RSA integers by over 10x (see Table 1). To only compare the Toffoli counts as in Table 1 may prove misleading, however, as it ignores the cost of routing, the benefits of parallelization, etc. Ideally, we would like to compare our runtime and physical qubit usage to previous works in the literature. However, this is only possible when such estimates are actually reported and use physical assumptions similar to our own. The number of works for which this requirement is met is limited.

The works by Jones et al., Fowler et al., and Gheorgiu et al. stand out in that they use the same basic cost model as we use in this paper, enabling reasonable comparisons to be made. We improve on their estimates by over 100x (see Table 2) when accounting for slight remaining differences in the cost model. We also include other historical cost estimates, although in these cases the differences in physical assumptions are substantial.

Figure 1: Log-log plot of estimated space and expected-time costs, using our parallel construction, for various problems and

Figure 1: Log-log plot of estimated space and expected-time costs, using our parallel construction, for various problems and problem sizes. See Section 3 for additional details. The jumps in space around $n = 32786$ occur as the algorithm exceeds the error budget of the CCZ factory and switches to the T factory. Generated by ancillary file "estimate costs.py".

	Abstract Qubits	Measurement Depth	Toffoli + T /2 Count	Toffoli + T /2 Count (billions)			Min Volume (megaqubitdays)
Factoring RSA integers	Asympototic			n = 1024	n = 2048	n = 3072	n = 1024	n = 2048	n = 3072
Vedral et al. 1996	7n+1	80n³ + O(n²)	80n³ + O(n²)	86	690	2300	240	4100	23000
Zalka 1998 (basic)	3n + O(1)	12n³ + O(n)	12n3 + O(n²)	13	100	350	16	250	1400
Zalka 1998 (log add)	5n + O(1)	600n² + O(n)	52n³ + O(n²)	56	450	1500	16	160	540
Zalka 1998 (fft mult)	≈96n	≈2¹⁷n^1.2	≈2¹⁷n²	140	550	1200	62	260	710
Beauregard 2002	2n+3	144n³ lg n + O(n² lg n)	576n³ lg²n+O(n³ lg n)	62000	600000	2200000	32000	380000	1700000
Fowler et al. 2012	3n + O(1)	40n³ + O(n²)	40n³ + O(n²)	43	340	1200	53	850	4600
Haner et al. 2016	2n+2	52n³ + O(n²)	64n³ lg n + O(n³)	580	5200	19000	230	2800	13000
(ours) 2019	3n+0.002n lg n	500n² + n² lg n	0.3n³ + 0.0005n³ lg n	0.4	2.7	9.9	0.5	5.9	21
Solving elliptic curve DLPs	Asympototic			n = 160	n = 224	n = 256	n = 160	n = 224	n = 256
Roetteler et al. 2017	9n+ O(lg n)	448 n³ lg n + 4090 n³	448n³ lg n + 4090 n³	30	84	130	13	52	83

Table 1: Expected costs of factoring n bit RSA integers using various constructions proposed in the literature. For comparison, we include a single construction for solving the DLP in n bit prime order elliptic curve groups with comparable classical security levels. The estimated minimum spacetime volumes assume modern surface code constructions, even for older papers.

Historical cost estimate at n = 2048	Physical gate error rate	Cycle time (microseconds)	Reaction time (microseconds)	Physical connectivity	Distillation strategy	Execution strategy	Physical qubits (millions)	Expected runtime (days)	Excepected vloume (megaqubitdays)
	Physical assumptions				Approach		Estimated costs
Van Meter et al. 2009	0.20%	49	N/A	planar	1000 + T and S	distillation limited carry lookahead	6500	410	2600000
Joned et al. 2010	0.10%	0.25	N/A	planar	7000 T	distillation limited carry lookahead	620	10	6200
Fowler et al. 2012	0.10%	1	0.1	planar	1200 T	reaction limited ripple carry	1000	1.1	1100
O'Gorman et al. 2017	0.10%	10	1	arbitary	block CCZ	reaction limited ripple carry	230	3.7	850
Gheoghiu et al. 2019	0.10%	0.2	0.1	planar	1100 T	reaction limited ripple carry	170	1	170
(ours) 2019 (1 factory)	0.10%	1	10	planar	1 CCZ	distillation limited ripple carry	16	6	90
(ours) 2019 (1 thread)	0.10%	1	10	planar	14 CCZ	reaction limited ripple carry	19	0.36	6.6
(ours) 2019 (1 parallel)	0.10%	1	10	planar	28 CCZ	reaction limited carry runways	20	0.31	5.9

Table 2: Historical estimates of the expected costs of factoring $n = 2048$ bit RSA integers, and the assumptions they used. We caution the reader to carefully consider the physical assumptions of each paper when comparing their expected volumes. For example, assuming a cycle time that is 5x more optimistic will reduce the expected volume by a corresponding factor of 5. See Appendix B for details on each entry in this table.

As most previous works focus either on factoring, or on the DLP in elliptic curve groups, it is hard to find references against which to compare our cost estimates for solving the DLP in multiplicative groups of finite fields. In general, the improvements we achieve for factoring RSA integers are comparable to the improvements we achieve for solving the general DLP in finite fields. As may be seen in Figure 1, the choice of parameterization has a significant impact on the costs. For the short DLP, and the DLP in Schnorr groups, we achieve significant improvements. These are primarily due to our use of derivatives of Shor's algorithm that are optimized for these parameterizations.