Unit 5: Parsing and Syntax Analysis
The next step of the compilation process is parsing. This step also has a foundation in formal languages and automata. Parsing takes input from the Lexical Analysis step and builds a parse tree, which will be used in future steps to develop the machine code. In this unit, we will define parsing and identify its uses. We will also discuss two parsing strategies, Top-Down Parsing and Bottom-Up Parsing, examining what it means to approach parsing from each standpoint and taking a look at an example of each. By the end of the unit, you will understand parsing techniques with regards to compilers, and be able to discuss each of the two main approaches.
Completing this unit should take you approximately 28 hours.
Upon successful completion of this unit, you will be able to:
- Explain parsing in the context of the compilation process.
- Describe several approaches or strategies used for parsing.
- Specify the functions of a parser in performing syntax analysis.
- Describe the handling of ambiguities.
- Summarize the design and construction of a scanner.
5.1: Parser Introduction and Overview
Read Chapter 3 through section 3.1.
Read these notes. This material overlaps some of the readings later in the class. However, they add additional information and have a practical perspective.
5.2: Requirements of a Parser
Read the following three paragraphs: "Programming languages," "Overview of the process," and "Types of Parsers." Requirements of a parser include: build internal representation of the input tokens (which come from the scanner); check that the input string of tokens is legal in the language of the compiler (i.e., that it can be derived from the start symbol); and determine the derivation steps for the input string of tokens. In addition, the parser should be reliable, efficient (how efficient depends on the intended use), user friendly (i.e., provide clear, accurate messages), and supportable (assuming that the parser will be used for a long time).
5.3: Functions of a Parser
Read subsections 3.4 - 3.6. The functions of a parser include: building an internal representation of the derivation tree and related parser information, and resolving ambiguities of the language pertaining to the input string of tokens.
Read slide 4. The first bullet is a requirement statement and the third bullet is a function statement. An additional function is the output of meaningful and accurate messages, including error messages.
5.4: Formal Language Considerations
This subsection is just a review of what has been covered in units 2 and 3. Read subsections 3.2 and 3.3. The main idea is that context free languages are used to build efficient parsers, but are supplemented with special techniques to resolve ambiguities.
5.5: Design of a Parser
5.5.1: Top-Down Parsers
Read sections 3.7 to 3.13.
This material overlaps some of the previous readings. However, it presents the material using examples. Scan over the slides, studying those that you feel will benefit you.
This material overlaps previous readings, but provides a practical view. Look over the material and study the parts you feel will benefit you.
Watch both lectures.
5.5.2: Bottom-Up Parsers
Read sections 3.14 to 3.18.
This material overlaps some of the previous readings. However, it presents the material using examples. Scan over the slides, studying those that you feel will benefit you.
This material overlaps previous readings, but provides a practical view. Look over the material and study the parts you feel will benefit you.
Read these notes, as well as the notes for Lecture 12.
Watch these lectures.
5.6: Construction of a Parser
Read sections 3.15 to the end of Chapter 3.
Read Notes 12, 14, and 16. Read the slides for lectures 5 (264 to the end), 6 (48 to the end), and 7 (91 to the end).
This material overlaps, but also complements, some of the previous readings. Scan over the slides, studying those that you feel will benefit you. In particular, in the introduction to shift-reduce parsing, study slides 60 to the end.
5.7: Verification and Validation of a Parser
In this exercise, you will be introduced to LEX and YACC, which stand for Lexer (short for lexical analyzer) and Yet Another Compiler-Compiler. They are generators, i.e. programs that generate other programs, in this case, a scanner and a parser. Our focus in this Assessment will be on YACC. Read the part of the tutorial on YACC. This tutorial explains YACC and how YACC and LEX interface. LEX and YACC are the original programs, and just as Flex is an open software version for LEX, Bison is an open software version for YACC.