This unit will address several advanced topics in computer architecture, focusing on the reasons for and the consequences of the recent switch from sequential processing to parallel processing by hardware producers. You will learn that parallel programming is not easy and that parallel processing imposes certain limitations in performance gains, as seen in the well-known Amdahl's law. You will also look into the concepts of shared memory multiprocessing and cluster processing as two common means of improving performance with parallelism. The unit will conclude with a look at some of the programming techniques used in the context of parallel machines.
Completing this unit should take you approximately 3 hours.
Read section 1.3 to learn about multi-core chips. These two pages give a summary of processor and chip trends to overcome the challenge of increasing performance and addressing the heat problem of a single core.
Read sections 2.1 and 2.2 to learn about parallel computer architectures. There are different types of parallelism: there is instruction-level parallelism, where a stream of instructions is simultaneously in partial stages of execution by a single processor; there are multiple streams of instructions, which are simultaneously executed by multiple processors. A quote from the beginning of the chapter states the key ideas:
"In this chapter, we will analyze this more explicit type of parallelism, the hardware that supports it, the programming that enables it, and the concepts that analyze it."
This chapter begins with a simple scientific computation example, followed by a description of SISD, SIMD, MISD, and MIMD architectures."
Study the "Amdahl's Law" section. Amdahl's law explains the limitations to performance gains achievable through parallelism. Over the last several decades or so, increases in computer performance have largely come from improvements to current hardware technologies and less from software technologies. Now, however, the limits to these improvements may be near. For significant continued performance improvement, either new physical technology needs to be discovered and/or transitioned to practice, or software techniques will have to be developed to get significant gains in computing performance.
In the equation for Amdahl's law, P is the fraction of code that can be parallelized (that is, the part that must be executed serially); S is the fraction of code that cannot be parallelized; and n is the number of processors. P + S is 1. If there are n processors, then P + S can be executed in the same time that P/n + S can be executed. Thus, the ratio of the time using 1 processor to the time of using n processors is 1/(P/n + S). This is the speedup in going from 1 processor to n processors.
The speedup is limited, even for large n. If n is 1, the speedup is 1. If n is very large, then the speedup is 1/S. If P is very small, then P/n is even smaller, and P/n + S is approximately S. The speedup is 1/S, but S is approximately S + P, which is 1. Therefore, the speed of execution of this code using 1 processor is about the same as using n processors.
Another way of writing Amdahl's law is 1/(P/n + [1 - P]). Thus, if P is close to 1, the speedup is 1/(P/n) or n/P, which is approximately n.
Apply Amdahl's law to better understand how it works by substituting a variety of numeric values into this equation and sketching the graph of the equation.
From section 4 of Chapter 3 of this resource, study the section titled "Amdahl's Law."
Study this material, which focus on the problem of parallel software and discusses scaling by using an example to explain shared memory and message passing, and identify some problems related to cache and memory consistency.
Read section 2.5, which covers two extreme approaches to parallel programming. First, parallelism is handled by the lower software and hardware layers. OpenMP is applicable in this first case. Secondly, parallelism is handled by the programmer. MPI is applicable in the second case.
If you want to know more about programming on parallel machines, you may read this book.
Chapter 1 uses a matrix times (multiplication) vector example in section 1.3.1. It goes on to describe parallel approaches for computing a solution. Section 1.3.2 describes a shared-memory and threads approach; section 1.3.3 describes a message passing approach; section 1.3.4 describes the MPI and R language approach. Study these sections to get an overview of the idea of software approaches to parallelism.
Chapter 2 presents issues that slow the performance of parallel programs.
Chapter 3 discusses shared-memory parallelism. Parallel programming and parallel software are extensive topics and our intent is to give you an overview of them; a more in-depth study is provided by the following chapters.
Chapter 4 discusses MP directives and presents a variety of examples.
Chapter 5 presents GPUs (Graphics Processing Units) and the CUDA language. This chapter also discusses parallel programming issues in the context of GPUs and CUDA and illustrates them with various examples.
Chapter 7 illustrates the message-passing approach using various examples.
Chapter 8 describes MPI (Message Passage Interface), which applies to networks of workstations (NOWs). The rest of the chapter illustrates this approach with various examples.
Chapter 9 gives an overview of cloud computing and the Hadoop platform, which are interesting topics for today not just for parallel computing.
Section 10.1 in Chapter 10 explains what R is.
Take this assessment to see how well you understood this unit.