Study this article to learn how addition is implemented and carried out at the gate level. Nowadays, computers are architected using larger components. For example, to perform addition and subtraction, computer architects utilize ALUs, arithmetic logic units. You can design a computer without knowing the details of an ALU or of an adder, similar to using a calculator to find the square root of a number without knowing how to manually compute the square root (or in computer science terminology, without knowing the algorithm that the calculator performs to find the square root). However, we want you to have the strongest foundation in your study of computer architecture. Knowing the underlying algorithm for larger components, you will be able to better use them in constructing larger components, for example, using half adders to construct a full adder. A half adder takes 2 bits as input and outputs a sum and a carry bit; a full adder takes 2 operand bits and a carry bit as input, and outputs a sum bit and a carry bit. To add two floating point numbers, one or both need to be put in a form such that they have the same exponent. Then, the mantissas are added. Lastly, the result is normalized. We will cover floating point addition later. Subtraction is not discussed explicitly, because it is done via addition by reversing the sign of the number to be subtracted (subtrahend) and adding the result to the number subtracted from (minuend). You have to be careful when subtracting floating point numbers, because of the possible large round off error in the result.

Addition and subtraction are similar algorithms. Taking a look at subtraction, we can see that:

$a-b=a+(-b)$
Using this simple relationship, we can see that addition and subtraction can be performed using the same hardware. Using this setup, however, care must be taken to invert the value of the second operand if we are performing subtraction. Note also that in twos-compliment arithmetic, the value of the second operand must not only be inverted, but 1 must be added to it. For this reason, when performing subtraction, the carry input into the LSB should be a 1 and not a zero.

A half adder is a circuit that performs binary addition on two bits. A half adder does not explicitly account for a carry input signal.

In verilog, a half-adder can be implemented as follows:

module half_adder(a, b, c, s);
input a, b;
output s, c;
s = a ^ b;
c = a & b;
endmodule


Full adder circuits are similar to the half-adder, except that they do account for a carry input and a carry output. Full adders can be treated as a 3-bit adder with a 2-bit result, or they can be treated as a single stage (a 3:2 compressor) in a larger adder.

As can be seen below, the number of gate delays in a full-adder circuit is 3:

We can use verilog to implement a full adder module:

module full_adder(a, b, cin, cout, s);
input a, b, cin;
output cout, s;
wire temp;
temp = a ^ b;
s = temp ^ cin;
cout = (cin & temp) | (a & b);
endmodule


A serial adder is a kind of ALU that calculates each bit of the output, one at a time, re-using one full adder (total). This image shows a 2-bit serial adder, and the associated waveforms.

Serial adders have the benefit that they require the least amount of hardware of all adders, but they suffer by being the slowest.

A parallel adder is a kind of ALU that calculates every bit of the output more or less simultaneously, using one full adder for each output bit. The 1947 Whirlwind computer was the first computer to use a parallel adder.

In many CPUs, the CPU latches the final carry-out of the parallel adder in an external "carry flag" in a "status register".

In a few CPUs, the latched value of the carry flag is always wired to the first carry-in of the parallel adder; this gives "Add with carry" with 2s' complement addition. (In a very few CPUs, an end-around carry -- the final carry-out of the parallel adder is directly connected to the first carry-in of the same parallel adder -- gives 1's complement addition).

Numbers of more than 1 bit long require more then just a single full adder to manipulate using arithmetic and bitwise logic instructions[citation needed]. A simple way of operating on larger numbers is to cascade a number of full-adder blocks together into a ripple-carry adder, seen above. Ripple Carry adders are so called because the carry value "ripples" from one block to the next, down the entire chain of full adders. The output values of the higher-order bits are not correct, and the arithmetic is not complete, until the carry signal has completely propagated down the chain of full adders.

If each full adder requires 3 gate delays for computation, then an n-bit ripple carry adder will require 3n gate delays. For 32 or 64 bit computers (or higher) this delay can be overwhelmingly large.

Ripple carry adders have the benefit that they require the least amount of hardware of all adders (except for serial adders), but they suffer by being the slowest (except for serial adders).

With the full-adder verilog module we defined above, we can define a 4-bit ripple-carry adder in Verilog. The adder can be expanded logically:

wire [4:0] c;
wire [3:0] s;
full_adder fa1(a[0], b[0], c[0], c[1], s[0]);
full_adder fa2(a[1], b[1], c[1], c[2], s[1]);
full_adder fa3(a[2], b[2], c[2], c[3], s[2]);
full_adder fa4(a[3], b[3], c[3], c[4], s[3]);


At the end of this module, s contains the 4 bit sum, and c[4] contains the final carry out.

This "ripple carry" arrangement makes "add" and "subtract" take much longer than the other operations of an ALU (AND, NAND, shift-left, divide-by-two, etc). A few CPUs use a ripple carry ALU, and require the programmer to insert NOPs to give the "add" time to settle.[1] A few other CPUs use a ripple carry adder, and simply set the clock rate slow enough that there is plenty of time for the carry bits to ripple through the adder. A few CPUs use a ripple carry adder, and make the "add" instruction take more clocks than the "XOR" instruction, in order to give the carry bits more time to ripple through the adder on an "add", but without unnecessarily slowing down the CPU during a "XOR". However, it makes pipelining much simpler if every instruction takes the same number of clocks to execute.

Carry-lookahead adders use special "look ahead" blocks to compute the carry from a group of 4 full-adders, and passes this carry signal to the next group of 4 full adders. Lookahead units can also be cascaded, to minimize the number of gate delays to completely propagate the carry signal to the end of the chain. Carry lookahead adders are some of the fastest adder circuits available, but they suffer from requiring large amounts of hardware to implement. The number of transistors needed to implement a carry-lookahead adder is proportional to the number of inputs cubed.

The addition of two 1-digit inputs A and B is said to generate if the addition will always carry, regardless of whether there is an input carry (equivalently, regardless of whether any less significant digits in the sum carry). For example, in the decimal addition 52 + 67, the addition of the tens digits 5 and 6 generates because the result carries to the hundreds digit regardless of whether the ones digit carries (in the example, the ones digit clearly does not carry).

In the case of binary addition, $A+B$ generates if and only if both A and B are 1. If we write $G(A,B)$ to represent the binary predicate that is true if and only if $A+B$ generates, we have:

$G(A,B)=A\cdot B$

The addition of two 1-digit inputs A and B is said to propagate if the addition will carry whenever there is an input carry (equivalently, when the next less significant digit in the sum carries). For example, in the decimal addition 37 + 62, the addition of the tens digits 3 and 6 propagate because the result would carry to the hundreds digit if the ones were to carry (which in this example, it does not). Note that propagate and generate are defined with respect to a single digit of addition and do not depend on any other digits in the sum.

In the case of binary addition, $A+B$ propagates if and only if at least one of A or is 1. If we write $P(A,B)$ to represent the binary predicate that is true if and only if $A+B$ propagates, we have:

$P(A,B)=A+B$

The power of carry-lookahead adders is that the bit-length of the adder can be expanded without increasing the propagation delay too much. By cascading lookahead modules, and passing "propagate" and "generate" signals to the next level of the lookahead module. For instance, once we have 4 adders combined into a simple lookahead module, we can use that to create a 16-bit and a 64-bit adder through cascading:

The 64-bit carry lookahead unit is exactly the same as the 4-bit and 16-bit units. This means that once we have designed one carry lookahead module, we can cascade it to any large size.

A generalized CLA block diagram. Each of the turquoise blocks represents a smaller CLA adder.

We can cascade the generalized CLA block above to form a larger CLA block. This larger block can then be cascaded into a larger CLA block using the same method.