CS3012 (Formal Languages and Compilers) Tuesday, 7th August, 2001
1. (a) (i) Define a contextfree grammar (and not just a CFG rule).
(ii) Write contextfree grammars for the following three languages:
I palindromes over the alphabet {x, y, z}, where palindromes read the same forward and backward
II {a^{i}b^{j}c^{k}: k = i + j, i,j,k 0}
III {a^{i}b^{j}: i > j, j 0}
(12)

Using the grammar and parse table below, show using the LR(1) algorithm that (a + b)*a is in the language of the grammar. id represents both a and b.

E > E + T

E > T

T > T * F

T > F

F > ( E )

F > id


E

T

F

id

+

*

(

)

#

0

1

2

3

S5



S4



1





S6




A

2





R2

S7


R2

R2

3





R4

R4


R4

R4

4

8

2

3

S5



S4



5





R6

R6


R6

R6

6


9

3

S5



S4



7



10

S5



S4



8





S6



S11


9





R1

S7


R1

R1

10





R3

R3


R3

R3

11





R5

R5


R5

R5

(8)

What are the main criteria for judging error recovery in compiling? How would you introduce error recovery into the parse table of part (b)?
(5)
(25)
PLEASE TURN OVER
2. (a) Let T be an alphabet, and be the empty string.
(i) What are the three basic regular expressions, and what languages (over T) do they represent?
(ii) What are the rules for constructing complex regular expressions out of simpler ones, and what languages do they create from the simpler languages?
(6)
(b) Which of the following strings are in the language defined by the regular expression (b+a^{*}b)(a^{*}b + ba)^{* } ?
(i) aabaabba
(ii) aba
(iii) babab
(iv) bbababb
(2)
(c) Write regular expressions for the languages of all strings over {0,1}:
(i) that do not contain two 1s in a row;
(ii) that begin or end with 00 or 11;
(iii) that contain an odd number of 1s and any number of 0s.
(7)
(d) (i) Explain the relationship between Finite State Automata and Regular Expressions.
(ii) Explain how the Unix utility Lex works in terms of FSAs and Regular Expressions.
(iii) What role does Lex play in compiling?
(10)
(25)
3. Consider the following simple language.
There are two data types: int and real. A variable consists of a sequence of upper or lower case letters. Variables are declared by stating the data type, then a nonempty commaseparated sequence of variable names, and then a semicolon. All declarations come before any other statements. All variables must be declared. Statements assign expressions to variables. Expressions are the sum or product of other expressions, or they are other expressions inside parentheses, or they are variables, or they are numerical constants. An integer constant consists of a nonempty string of digits. A real constant consists of two nonempty strings of digits, separated by a single ".". Each assignment statement ends in a semicolon. Adding or multiplying an integer and a real produces a real. Assigning an integervalued expression to a real variable is legal; assigning a realvalued expression to an integer variable is illegal, and causes an error.
An example program is given below.
int x, y;
real a, b;
int count;
count := 2;
x := count * y;
a := x + (b*y);
(a) Write the regular expression/action section of a Lex script to create a lexical analyser which will recognise the basic units of the language, and return appropriate tokens. Do not worry about error checking.
(5)
(b) Write a grammar for the language. You may write an ambiguous grammar. Write each grammar rule on a separate line. Do not attempt to do type checking in the grammar.
(10)
(c) What is the difference between an inherited and a synthesised attribute in an attribute grammar? What is the significance of the difference for compiling?
(3)
(d) Augment your grammar to construct a type checker by associating attributes with some of the symbols and associating attribute rules with some of the grammar rules. Your type checker should implement the last sentence of the language description above.
(7)
(25)
PLEASE TURN OVER
4. (a) (i) What is a symbol table?
(ii) Outline how a symbol table might be constructed during the compilation process. Explain the consequences for symbol table construction of compiling a language with a single global scope and one with nested scopes.
(6)

What is an abstract syntax tree? Why might abstract syntax trees be used in compiling?
(4)

Show a suitable abstract syntax tree for the following program fragment. Use as few nodes as possible, without losing any relevant information contained in the program. Use sibling links for elements on the same level. No node should have more than four children.
real half(int x) {
real k;
k = (x+1)* 2;
return k;
}
(5)

A grammar is shown below for the language of part (c). Show how you would augment the grammar rules with functions to create syntax trees in the style of your answer to (c). Explain any functions you use. You may use either the $ notation, or attributes. If a grammar rule has only one symbol on the righthand side, and you are simply passing all the attributes from that symbol to the left hand side, you do not need to write an action.
1) Func > Type name ( Params ) { Stats }
2) Type > int
3) Type > real
4) Params > Decl , Params
5) Params > Decl
6) Decl > Type name
7) Stats > Stat Stats
8) Stats > Stat
9) Stat > Decl ;
10) Stat > Assig ;
11) Stat > Return ;
12) Assig > name = Expr
13) Expr > Expr + Term
14) Expr > Term
15) Term > Term * Factor
16) Term > Factor
17) Factor > name
18) Factor > int
19) Factor > real
20) Factor > ( Expr )
21) Return > return Expr
(10)
(25)
END OF EXAMINATION PAPER
