Unit 8: Code Generation
This unit is closely related to Unit 7, where the emphasis was on representation of data structures needed for run-time. While there will be some overlap, the emphasis in this unit is on instruction-level intermediate code generation and from intermediate code to target code.
The last phase (or next to the last phase if there is a code optimization phase) of the compilation process is code generation, where the output from the previous steps is finally translated into machine code, ready to execute on the target platform. In this unit, we will start with a discussion of code generation in general before moving on to a more detailed description of the code generation process. This will include an in-depth discussion of three main areas: Instruction Selection, Instruction Scheduling, and Register Allocation. By the end of this unit, you will have a firm understanding of the code generation process.
Completing this unit should take you approximately 17 hours.
Upon successful completion of this unit, you will be able to:
- Explain the use of an intermediate language.
- Identify the difficult aspects of code generation.
- Give examples of translation from simple source statements to intermediate code.
- Give examples of translation from intermediate language statements to assembler or machine code.
8.1: Introduction to Intermediate Code Generation
Read Chapter 7, which discusses generation to a relatively low-level intermediate language. Section 7.9 overlaps some of the previous readings.
Intermediate representation of a source program may be part of the front end, may be part of the back end, depending on the design of the compiler and its intended use, or may be simply called the "middle part" of the compiler between the front end and back end. Handout 23 covers IR in the context of code generation.
8.2: Detailed Example: Three Address Code (TAC)
TAC is a generic assembly language. Read this handout, and then the associated slides. Slides 1 through 149 give examples of TAC statements, function calls and encoding of objects.
8.3: Additional Intermediate Language Generation Examples
These notes discuss generation of intermediate code for a stack machine, stack machine with accumulator, and for a TAC machine.
This lecture is optional, and is listed as an aid if you have any questions on the notes.
8.4: Generation of Machine Code
Read Chapter 8. It describes the translation from a low-level intermediate language to machine code. This translation addresses the handling of restrictions on the machine language; for example, a finite number of registers is a restriction of a target machine and its machine language. The intermediate language assumes an unlimited number of registers, i.e., virtual register machine.
Read these notes on generation of machine code from intermediate code.
Read Chapters 9 and 10. Chapter 9 overlaps the above reading, and is a supplementary discussion of the problem of mapping a large number of variables used in an intermediate language translation into a smaller number of registers, plus memory locations available on the target machine. Chapter 10 covers function calls using a stack, activation records, and frame pointers. Look over these chapters and read the parts that will add to your understanding of register allocation and function calls.
Watch these videos.
Read these notes, as well as the ones on Garbage Collection. These are very well-done formal presentations and give a lot of detail. Register allocation, linear scan and Chaitin's algorithm are explained. Regarding garbage collection, reference counting, mark-and-sweep, and stop-and-copy are explained.
8.5: Verification and Validation of Code Generation
Do Exercise 7.1 on page 173 and Exercise 8.2 on page 189.
These exercises will give you some practice with code generation and are based on Chapter 7, which covers intermediate code generation, and Chapter 8, which covers machine code generation. Intermediate code can be high level, i.e. close to the input language, or low level, i.e. close to the target machine language, and, hence, though Exercise #2 jumps to Chapter 8, the process of syntax directed translation is the same. These exercises illustrate the process for syntax directed translation to intermediate code or machine code. You can check your answers against the Answer Key.