Unit 1: Introduction to Compilers
The compilation process is one of the steps in executing a program. Understanding how compilers work and what goes on "behind the scenes" will help you get better at developing software. This unit will first provide you with an introduction to the compiler, its history, compiler structure and design, and the types of compilers. By the end of this unit, you will be able to describe the steps of the compilation process.
Completing this unit should take you approximately 8 hours.
Upon successful completion of this unit, you will be able to:
- Describe the role and applications of compilers.
- Define compiler, interpreter, and translator, and recount the early history of compilers.
- Explain the process for development of a compiler in the context of a systems process.
- Explain the structure of compilers and the corresponding steps in the compilation process.
1.1: Overview of Compilers
A compiler is a complex software system, and, as such, has a system life cycle that starts with a need, transitions to design and construction/implementation, undergoes testing, transitions to use in its intended environment, undergoes maintenance over its life, and ends with "disposal" and archival storage. The activities, various work products, and documentation that comprise the processes used during the system's life must be carefully planned. The part of the system life cycle that deals with the design and construction of the compiler is called the development life cycle. This course focuses on the development life cycle, but the overall system life cycle must be kept in mind.
The term "compiler process" is related to, but different from the system (life cycle) process. The "compiler process" refers to the processing that a compiler does when it performs its function of translation of a source program to a target program. It also refers to the approach taken to design a compiler. Design, in fact, is one activity in the system (life cycle) process.
1.1.1: Compiler Definitions, Terminology, and Anatomy of a Compiler
For a definition of a compiler and some terminology, study slides 13-26. For an anatomy of a compiler see slides 27-47. For examples of optimization see slides 48-76. These slides have good examples of compiler output for a given input and a lot of examples of optimizations. A compiler translates a high-level language to a low-level language.
For a slightly different overview of compilers, study slides 8-47. These slides make an analogy between compilation and the equivalence-preserving transformations of an electrical circuit.
Read the short paragraph on Related Techniques, which defines related terms: language translator, assembler, disassembler, and decompiler.
1.1.2: History
Read the interesting history of the early compilers and computer pioneers, such as Grace Hopper, who worked on the first COBOL compiler.
Read sections 1.1, 1.2, and 1.3. Note that "compile" means to translate from a high-level language, used by humans, to a low-level language used by a computer. A high-level language uses concepts, objects, and tasks performed by a human, whereas a low-level or machine language uses concepts, objects, and tasks performed by machines.
The input language to a compiler, typically a programming language, is called the source language. The output language of a compiler is called the target language or object code, typically an assembly language or machine language.
The author describes seven phases of a compiler. The middle phase is called Intermediate Code Generation. The intermediate code is independent of a particular target machine. One of the back-end phases is called Machine Code Generation, which translates the machine-independent code to machine-dependent code for a particular computer.
The need for compilers arises from the need for high-level languages, which are more relevant for problem solving and software development by humans. Moreover, high-level languages are machine independent.
1.1.3: Other Applications
Read section 1.4, which gives reasons for studying compilers. In the discussion, the author suggests other applications for compiler techniques and tools. He specifically mentions domain-specific languages. Other applications derive from techniques and methods used in compilers. For example, if the input to a software system can be described using a formal grammar, then lexical- and syntax-phase techniques may be the most effective and efficient to use for inputting, verifying, and representing the data for processing. Front-end techniques can also be applied via software tools to the enforcement of development and programming standards and procedures. Knowledge of compiler techniques and methods helps in the design of new programming languages, new hardware architectures, generation of software applications, and the design of software tools.
1.1.4: Hardware Compilation
Read the short paragraph on Hardware Compilation. Note that hardware compilation refers to the translation of a software program to a hardware representation.
1.2: Compiler Design Overview
The need for compilers arises from the need for high-level languages, which are more relevant for problem solving and software development by humans. Moreover, high-level languages are machine independent.
Since development of a compiler is a relatively complex system-development effort, typically having many users and developers, and will be maintained over a life of many years, a formal process should be used for its development. The development process should extend from requirements through verification and validation, and should include reviews, tests, analysis and measures, quality assurance, configuration control, and key documentation. The development process is a part of the overall system life cycle process, which additionally, includes deployment, maintenance, disposal and archival storage. The compiler development process should consist of procedures for writing and documenting the needs and requirements, the architecture and design, construction, integration, and verification and validation of the compiler. Documentation should also include the formal foundations and techniques used, tradeoffs made, alternatives evaluated, and the chosen alternative for design or implementation. Full coverage of all of these in detail is beyond the scope of this course. As we proceed through this course, however, we include high-level needs, requirements, functions, performance considerations, and verification and validation issues for a compiler and its parts.
1.2.1: One-Pass vs. Multi-Pass
Read the paragraph on one-pass and multi-pass compilers. This distinction was more relevant in the early days of computing, when computer memory was limited and processing time was much slower compared to current computers.
A "pass" refers to reading and processing the input, i.e., source or source representation. A one-pass compiler is simpler to implement. However, more sophisticated processing, such as optimizations, may require multiple passes over the source representation.
The compiler is composed of components, which perform the steps of the compilation process. Some of the components may make a separate pass over the source representation--thus, the name of multi-pass. Multi-pass compilers are used for translating more complex languages (for example, high-level language to high-level language), and for performing more complete optimizations. Moreover, multi-pass compilers are easier to verify and validate, because testing and proof of correctness can be accomplished for the smaller components. Multi-pass compilers are also used for translation to an assembler or machine language for a theoretical machine, and for translation to an intermediate representation such as bytecode. Bytecode, or pcode (portable code), is a compact encoding that uses a one-byte instruction, and parameters for data or addresses, to represent the results of syntax and semantic analysis. The bytecode or pcode is independent of a particular machine. It is executed by a virtual machine residing on the target machine.
1.2.2: Structure of a Compiler
Read the CS143 Course Overview handout. You have studied the lecture "Introduction to Compilers" already. The handout gives an overview of the structure of a compiler with a good explanation.
This overview describes the front-end process, or analysis stage, of the compilation process. The intermediate code used as an example is TAC, three-address code. The front end, or analysis, consists of lexical analysis, syntax analysis or parsing, semantic analysis, and intermediate code generation. Lexical analysis may be preceded by preprocessing, depending on the language and the development environment.
It also explains the back-end, or synthesis state, of the compilation process, which includes intermediate code optimization, object code generation, and object code optimization. The symbol table and error-handling topics apply to both the front end and to the back end. The section on one-pass versus multi-pass complements the previous explanation of the number of "passes" that a compiler uses.
The handout also has a very good section on history, which helps us understand why the structure of a compiler is as described in previous sections. Read the Historical Perspective section, which shows the important influence of programming languages on the development of compilers.