
Our construction
Interactions between optimizations
A major benefit of the set of optimizations we have chosen to use in this paper is that they complement each other. They make orthogonal improvements that compound or reinforce, instead of conflicting.
For example, when using the coset representation of modular integers, it is important that additions only use offsets less than N. Larger offsets cause larger approximation error. However, because we are using windowed arithmetic, every addition we perform has an offset that is being returned by a table lookup. Since the entries in the tables are classically precomputed, we can classically ensure all offsets are canonicalized into the [0, N ) range.
There are two cases where the optimizations do not mesh perfectly.
First, using oblivious runways reduces the depth of addition but not the depth of lookups. This
changes their relative costs, which affects the optimal window sizes to use in windowed arithmetic.
When addition is suddenly four times faster than it used to be, the optimal window sizes
and
decrease by 1 (approximately).
Second, when iterating over input qubits during a multiply-add, it is necessary to also iterate
over runway qubits and padding qubits. This increases the number of lookup additions that need
to be performed in order to complete a windowed multiplication. This issue is partially solved
by temporarily adding the runways back into the main register before performing a multiply-add
where that register is used as one of the factors (instead of the target). This temporarily reduces
each runway to a single carry qubit (see Figure 3). We could reduce the runways to zero qubits,
but this would require propagating the carry qubits all the way across the register, so we do not.