CS301: Add and Subtract Blocks

Addition and Subtraction

Addition and subtraction are similar algorithms. Taking a look at subtraction, we can see that:

\( a-b=a+(-b) \)

Using this simple relationship, we can see that addition and subtraction can be performed using the same hardware. Using this setup, however, care must be taken to invert the value of the second operand if we are performing subtraction. Note also that in twos-compliment arithmetic, the value of the second operand must not only be inverted, but 1 must be added to it. For this reason, when performing subtraction, the carry input into the LSB should be a 1 and not a zero.

Our goal on this page, then, is to find suitable hardware for performing addition.

Bit Adders

Half Adder

A half adder is a circuit that performs binary addition on two bits. A half adder does not explicitly account for a carry input signal.

In verilog, a half-adder can be implemented as follows:

module half_adder(a, b, c, s);
 input a, b;
 output s, c;
 s = a ^ b;
 c = a & b;
endmodule

Full Adder

Full adder circuits are similar to the half-adder, except that they do account for a carry input and a carry output. Full adders can be treated as a 3-bit adder with a 2-bit result, or they can be treated as a single stage (a 3:2 compressor) in a larger adder.

Full adder circuit

As can be seen below, the number of gate delays in a full-adder circuit is 3:

We can use verilog to implement a full adder module:

module full_adder(a, b, cin, cout, s);
 input a, b, cin;
 output cout, s;
 wire temp;
 temp = a ^ b;
 s = temp ^ cin;
 cout = (cin & temp) | (a & b);
endmodule

Serial Adder[edit]

A serial adder is a kind of ALU that calculates each bit of the output, one at a time, re-using one full adder (total). This image shows a 2-bit serial adder, and the associated waveforms.

Serial adders have the benefit that they require the least amount of hardware of all adders, but they suffer by being the slowest.

Parallel Adder

A parallel adder is a kind of ALU that calculates every bit of the output more or less simultaneously, using one full adder for each output bit. The 1947 Whirlwind computer was the first computer to use a parallel adder.

In many CPUs, the CPU latches the final carry-out of the parallel adder in an external "carry flag" in a "status register".

In a few CPUs, the latched value of the carry flag is always wired to the first carry-in of the parallel adder; this gives "Add with carry" with 2s' complement addition. (In a very few CPUs, an end-around carry -- the final carry-out of the parallel adder is directly connected to the first carry-in of the same parallel adder -- gives 1's complement addition).

Ripple Carry Adder

Numbers of more than 1 bit long require more then just a single full adder to manipulate using arithmetic and bitwise logic instructions^{[citation needed]}. A simple way of operating on larger numbers is to cascade a number of full-adder blocks together into a ripple-carry adder, seen above. Ripple Carry adders are so called because the carry value "ripples" from one block to the next, down the entire chain of full adders. The output values of the higher-order bits are not correct, and the arithmetic is not complete, until the carry signal has completely propagated down the chain of full adders.

If each full adder requires 3 gate delays for computation, then an n-bit ripple carry adder will require 3n gate delays. For 32 or 64 bit computers (or higher) this delay can be overwhelmingly large.

Ripple carry adders have the benefit that they require the least amount of hardware of all adders (except for serial adders), but they suffer by being the slowest (except for serial adders).

With the full-adder verilog module we defined above, we can define a 4-bit ripple-carry adder in Verilog. The adder can be expanded logically:

wire [4:0] c;
wire [3:0] s;
full_adder fa1(a[0], b[0], c[0], c[1], s[0]);
full_adder fa2(a[1], b[1], c[1], c[2], s[1]);
full_adder fa3(a[2], b[2], c[2], c[3], s[2]);
full_adder fa4(a[3], b[3], c[3], c[4], s[3]);

At the end of this module, s contains the 4 bit sum, and c[4] contains the final carry out.

This "ripple carry" arrangement makes "add" and "subtract" take much longer than the other operations of an ALU (AND, NAND, shift-left, divide-by-two, etc). A few CPUs use a ripple carry ALU, and require the programmer to insert NOPs to give the "add" time to settle.^[1] A few other CPUs use a ripple carry adder, and simply set the clock rate slow enough that there is plenty of time for the carry bits to ripple through the adder. A few CPUs use a ripple carry adder, and make the "add" instruction take more clocks than the "XOR" instruction, in order to give the carry bits more time to ripple through the adder on an "add", but without unnecessarily slowing down the CPU during a "XOR". However, it makes pipelining much simpler if every instruction takes the same number of clocks to execute.

Carry Skip Adder

Carry Lookahead Adder

Carry-lookahead adders use special "look ahead" blocks to compute the carry from a group of 4 full-adders, and passes this carry signal to the next group of 4 full adders. Lookahead units can also be cascaded, to minimize the number of gate delays to completely propagate the carry signal to the end of the chain. Carry lookahead adders are some of the fastest adder circuits available, but they suffer from requiring large amounts of hardware to implement. The number of transistors needed to implement a carry-lookahead adder is proportional to the number of inputs cubed.

The addition of two 1-digit inputs A and B is said to generate if the addition will always carry, regardless of whether there is an input carry (equivalently, regardless of whether any less significant digits in the sum carry). For example, in the decimal addition 52 + 67, the addition of the tens digits 5 and 6 generates because the result carries to the hundreds digit regardless of whether the ones digit carries (in the example, the ones digit clearly does not carry).

In the case of binary addition, \( A+B \) generates if and only if both A and B are 1. If we write \( G(A,B)\) to represent the binary predicate that is true if and only if \(A+B \) generates, we have:

\( G(A,B)=A\cdot B \)

The addition of two 1-digit inputs A and B is said to propagate if the addition will carry whenever there is an input carry (equivalently, when the next less significant digit in the sum carries). For example, in the decimal addition 37 + 62, the addition of the tens digits 3 and 6 propagate because the result would carry to the hundreds digit if the ones were to carry (which in this example, it does not). Note that propagate and generate are defined with respect to a single digit of addition and do not depend on any other digits in the sum.

In the case of binary addition, \( A+B\) propagates if and only if at least one of A or B is 1. If we write \( P(A,B)\) to represent the binary predicate that is true if and only if \(A+B \) propagates, we have:

\( P(A,B)=A+B \)

Cascading Adders

The power of carry-lookahead adders is that the bit-length of the adder can be expanded without increasing the propagation delay too much. By cascading lookahead modules, and passing "propagate" and "generate" signals to the next level of the lookahead module. For instance, once we have 4 adders combined into a simple lookahead module, we can use that to create a 16-bit and a 64-bit adder through cascading:

The 16-Bit carry lookahead unit is exactly the same as the 4-bit carry lookahead adder.

64 Bit carry lookahead

The 64-bit carry lookahead unit is exactly the same as the 4-bit and 16-bit units. This means that once we have designed one carry lookahead module, we can cascade it to any large size.

Generalized Cascading

A generalized CLA block diagram. Each of the turquoise blocks represents a smaller CLA adder.

We can cascade the generalized CLA block above to form a larger CLA block. This larger block can then be cascaded into a larger CLA block using the same method.

Source: Wikibooks, https://en.wikibooks.org/wiki/Microprocessor_Design/Add_and_Subtract_Blocks
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.

Last modified: Monday, 8 April 2024, 12:38 PM

Course Introduction

Course Syllabus

Unit 1: Introduction to Computer Technology

1.1: Introduction to Computer Processors

Computer History

History of Computing Hardware (1960–Present)

1.2: Components of a Computer

Personal Computer Hardware

1.3: The Role of Processor Performance

CPU and Processor Time Counter

Microprocessor Design and Performance

Computing Benchmarks

Amdahl's Law

1.4: The Power Problem

The Need For A Radical New Type Of Computer Architecture

1.5: The Switch to Parallel Processing

Parallel Computing Landscape

1.6: Case Study: A Recent Intel Processor

Data Types, Operators, and Variables

Unit 1 Assessment

Unit 1 Assessment

Unit 2: Instructions: Hardware Language

2.1: Computer Hardware Operations

Introduction to Programming Languages

Machine Code

The Machine Cycle

2.2: Number Representation in Computers

Introduction to Number Systems

More on Number Systems

Floating Points

Practice with Number Systems

Converting Decimal Numbers to Binary

Another Way to Convert from Decimal to Binary

Fractions

2.3: Instruction Representation

MIPS Instructions

2.4: Logical and Arithmetic Instructions

MIPS Arithmetic Instructions

2.5: Control Instructions

MIPS Control Flow Instructions

2.6: Instructions for Memory Operations

MIPS Memory Instructions

2.7: Different Modes for Addressing Memory

Addressing Memory

MIPS Instruction Format

2.8: Case Study: Intel and ARM Instructions

X86 Instructions and ARM Architecture

Unit 2 Assessment

Unit 2 Assessment

Unit 3: Fundamentals of Digital Logic Design

3.1: Beginning Design: Logic Gates, Truth Table, and Logic Equations

Logic Design Principles

Logic Gates

Ladder Logic

Karnaugh Mapping

3.2: Combinational Logic

Combinational Logic Functions

3.3: Flip-Flops, Latches, and Registers

Multivibrators

3.4: Sequential Logic Design

Sequential Circuits

3.5: Case Study: Design of a Finite State Machine (FSM) to Control a Vending Machine

Finite State Automata

Unit 3 Assessment

Unit 3 Assessment

Unit 4: Computer Arithmetic

4.1: Number Representation

Floating Points

Integers and the Representation of Real Numbers

4.2: Addition and Subtraction Hardware

Add and Subtract Blocks

4.3: Multiplication

Binary Multipliers

Arithmetic Circuits

4.4: Floating Point Arithmetic

Floating Point Arithmetic and Error Analysis

4.5: Division

Division

Arithmetic for Computers

4.6: Case Study: Floating Point Arithmetic in an x86 Processor