## Torben Ægidius Mogensen's "Basics of Compiler Design, Chapter 2: Lexical Analysis"

Look over Chapter 2 on Lexical Analysis. If you are comfortable with this material, just review it. If you feel you are somewhat uncomfortable with this material, read or study it to become comfortable with it; it is the foundation for much of our work on compilers.

A regular expression has associated non-deterministic finite automata (NFA) that accept it.

The recognition of a regular expression in a regular language can be done in several ways: by operating on the expression directly, using an NFA, or using a DFA. Also, an automaton can be transformed to an equivalent having fewer states. The size (that is, the number of states) and the speed of the automaton determine the most efficient way to recognize a regular expression. Note in section 2.7 of the reading the time estimates for processing an expression by a DFA and by an NFA.

Regular languages include many languages and can be used for many applications, including scanners. Further, given regular languages, their union, concatenation, repetition, set difference, and intersection are also regular languages--we say that regular languages are closed under these operations. Regular languages can be expressed, equivalently, using regular expressions, NFAs, or DFAs. A key limiting characteristic of regular languages is seen from DFAs. A DFA is a finite automaton, and, thus, can remember only a finite number of symbols. Hence, a DFA cannot recognize a string of the type an b cn, for any n (because for any n it will require an infinite number of states to remember the number of a's). Most computer languages are not regular and we will need to use a larger formal language class to parse them.