Compiler Design Notes

Compiler Design Notes

Compiler Design

A language translator is a program which translates programs from source language into an equivalent program in an object language.

Keywords and phrases: source-language, object-language, syntax-directed, compiler, assembler, linker, loader, parser, scanner, top-down, bottom-up, context-free grammar, regular expressions

A computer constructed from actual physical devices is termed an actual computer or hardware computer. From the programming point of view, it is the instruction set of the hardware that defines a machine. An operating system is built on top of a machine to manage access to the machine and to provide additional services. The services provided by the operating system constitute another machine, a virtual machine.

A programming language provides a set of operations. Thus, for example, it is possible to speak of a Java computer or a Haskell computer. For the programmer, the programming language is the computer; the programming language defines a virtual computer. The virtual machine for Simple consists of a data area which contains the association between variables and values and the program which manipulates the data area.

In contrast with compilers an interpreter is a program which simulates the execution of programs written in a source language. Interpreters may be used either at the source program level or an interpreter may be used it interpret an object code for an idealized machine. This is the case when a compiler generates code for an idealized machine whose architecture more closely resembles the source code.

There are several other types of translators that are often used in conjunction with a compiler to facilitate the execution of programs. An assembler is a translator whose source language (an assembly language) represents a one-to-one transliteration of the object machine code. Some compilers generate assembly code which is then assembled into machine code by an assembler. A loader is a translator whose source and object languages are machine language. The source language programs contain tables of data specifying points in the program which must be modified if the program is to be executed. A link editor takes collections of executable programs and links them together for actual execution. A preprocessor is a translator whose source language is an extended form of some high-level language and whose object language is the standard form of the high-level language.

The typical compiler consists of several phases each of which passes its output to the next phase

· The lexical phase (scanner) groups characters into lexical units or tokens. The input to the lexical phase is a character stream. The output is a stream of tokens. Regular expressions are used to define the tokens recognized by a scanner (or lexical analyzer). The scanner is implemented as a finite state machine.

· The parser groups tokens into syntactical units. The output of the parser is a parse tree representation of the program. Context-free grammars are used to define the program structure recognized by a parser. The parser is implemented as a push-down automata.

· The contextual analysis phase analyzes the parse tree for context-sensitive information often called the static semantics. The output of the contextual analysis phase is an annotated parse tree. Attribute grammars are used to describe the static semantics of a program.

· The optimizer applies semantics preserving transformation to the annotated parse tree to simplify the structure of the tree and to facilitate the generation of more efficient code.

· The code generator transforms the simplified annotated parse tree into object code using rules which denote the semantics of the source language.

· The peep-hole optimizer examines the object code, a few instructions at a time, and attempts to do machine dependent code improvements.

1

The scanner

The scanner groups the input stream (of characters) into a stream of tokens (lexeme) and constructs a symbol table which is used later for contextual analysis. The lexemes include

· Key words,

· identifiers,

· operators,

· constants: numeric, character, special, and

· comments.

The lexical phase (scanner) groups characters into lexical units or tokens. The input to the lexical phase is a character stream. The output is a stream of tokens. Regular expressions are used to define the tokens recognized by a scanner (or lexical analyzer). The scanner is implemented as a finite state machine.

Lex and Flex are tools for generating scanners is C. Flex is a faster version of Lex.

The Parser

The parser groups tokens into syntactical units. The output of the parser is a parse tree representation of the program. Context-free grammars are used to define the program structure recognized by a parser. The parser is implemented as a push-down automata.

Yacc and Bison are tools for generating bottom-up parsers in C. Bison is a faster version of Yacc. Jack is a tool for generating scanners and top-down parsers in Java.

Leave a Comment