CS 6013 - Modern Compilers, theory and practise

Java Subsets

BuritoJava

BURITOJAVA SPECIFICATION
BuritoJava is a subset of Java. The meaning of a BuritoJava program is given by its meaning as a Java program. Overloading is not allowed in BuritoJava. The BuritoJava statement System.out.println( ... ); can only print integers. The BuritoJava expression e1 & e2 is of type boolean, and both e1 and e2 must be of type boolean. BuritoJava does not allow clashing of method names, variable names and class names.

TACoJava

grammar

tacojava.jj

TACOJAVA SPECIFICATION
TACoJava is a subset of Java. The meaning of a TACoJava program is given by its meaning as a Java program. Overloading is not allowed in TACoJava. The TACoJava statement System.out.println( ... ); can only print integers. The TACoJava expression e1 & e2 is of type boolean, and both e1 and e2 must be of type boolean. TACoJava does not allow clashing of method names, variable names and class names.

QTACoJava

grammar

qtacojava.jj

QTACOJAVA SPECIFICATION
QTACoJava is a subset of Java. The meaning of a TACoJava program is given by its meaning as a Java program. Overloading is not allowed in QTACoJava. The QTACoJava statement System.out.println( ... ); can only print integers. The QTACoJava expression e1 & e2 is of type boolean, and both e1 and e2 must be of type boolean. QTACoJava does not allow clashing of method names, variable names and class names.

TACoJava2

grammar

tacojava2.jj

TACOJAVA2 SPECIFICATION
TACoJava2 is a subset of Java. The meaning of a TACoJava program is given by its meaning as a Java program. The TACoJava2 statement System.out.println( ... ); can only print integers. The TACoJava2 expression e1 & e2 is of type boolean, and both e1 and e2 must be of type boolean.

LoopyTacoJava2

grammar

loopytacojava2.jj

LOOPYTACOJAVA2 SPECIFICATION
LoopyTacoJava is a subset of Java. The meaning of a LoopyTacoJava program is given by its meaning as a Java program. Overloading is not allowed in LoopyTacoJava. The LoopyTacoJava statement System.out.println( ... ); can only print integers. The LoopyTacoJava expression e1 & e2 is of type boolean, and both e1 and e2 must be of type boolean. LoopyTacoJava does not allow clashing of method names, variable names and class names. In addition, LoopyTacoJava does not allow arbitrary multiline comments.

The annotations supported by LoopyTacoJava2 are: LIC (to identify loop invariant code) and INDEPENDENTITERS (to identify if iterations of the loop are independent of each other). You can assume that each annotated loop is in cannonical form.

LoopyTacoJava

grammar

loopytacojava.jj

LOOPYTACOJAVA SPECIFICATION
LoopyTacoJava is a subset of Java. The meaning of a LoopyTacoJava program is given by its meaning as a Java program. Overloading is not allowed in LoopyTacoJava. The LoopyTacoJava statement System.out.println( ... ); can only print integers. The LoopyTacoJava expression e1 & e2 is of type boolean, and both e1 and e2 must be of type boolean. LoopyTacoJava does not allow clashing of method names, variable names and class names. In addition, LoopyTacoJava does not allow arbitrary multiline comments.

The loop annotations supported by LoopyTacoJava are: LOOPDISTRIBUTE, LICM, LOOPTILE, and LOOPUNROLL. The last two annotation take additional arguments giving the details about unroll factor, and the dimensions of the tile. For example, LOOPUNROLL m, where m >= 2 indicates that there will be 'm' copies of the loop body after unrolling. Similarly, LOOPTILE m n, where m, n >= 1, indicates that the size of each tile will be m x n. Give a loop whose body has statements S1; S2; .. Sn, the annotation LOOPDISTRIBUTE is used to distribute the loop to generate two loops, where the body of the first loop has only S1 and body of the second loop has all the rest of the statements. You can assume that each annotated loop is in cannonical form.

FunkyTacoJava

grammar

funkytacojava.jj

FUNKYTACOJAVA SPECIFICATION
FunkyTacoJava is a subset of Java. The meaning of a FunkyTacoJava program is given by its meaning as a Java program. Overloading is not allowed in FunkyTacoJava. The FunkyTacoJava statement System.out.println( ... ); can only print integers. The FunkyTacoJava expression e1 & e2 is of type boolean, and both e1 and e2 must be of type boolean. In addition, (i) FunkyTacoJava does not allow arbitrary multiline comments. (ii) each field and method access is explicity mentioned (e.g., a.f, this.f, a.foo(), this.bar(), etc) -- that is, `this' variable is explicitly used.

The inline annotations supported by FunkyTacoJava is: INLINE. A function call can have an optional annotation telling if the function is desired to be inlined.

FunkyTacoJava2

grammar

funkytacojava.jj

FUNKYTACOJAVA2 SPECIFICATION
FunkyTacoJava2 is a subset of Java. The meaning of a FunkyTacoJava2 program is given by its meaning as a Java program. Overloading is not allowed in FunkyTacoJava2. The FunkyTacoJava2 statement System.out.println( ... ); can only print integers. The FunkyTacoJava2 expression e1 & e2 is of type boolean, and both e1 and e2 must be of type boolean. In addition, (i) FunkyTacoJava does not allow arbitrary multiline comments. (ii) each field and method access is explicity mentioned (e.g., a.f, this.f, a.foo(), this.bar(), etc) -- that is, `this' variable is explicitly used.

MiniJava

grammar

minijava.jj

programs

MINIJAVA SPECIFICATION
MiniJava is a subset of Java. The meaning of a MiniJava program is given by its meaning as a Java program. Overloading is not allowed in MiniJava. The MiniJava statement System.out.println( ... ); can only print integers. The MiniJava expression e1 & e2 is of type boolean, and both e1 and e2 must be of type boolean. MiniJava does not allow clashing of method names, variable names and class names.

miniIR

grammar

miniIR.jj

programs

miniIR SPECIFICATION

MiniIR (Minijava Intermediate Representation) is an intermediate representation for Minijava. It has simplified expressions, exposes the runtime model (integer is of size 4, field and method references require accessing of the address of the object, "new" operator is replaced with system call HALLOCATE).

procedures:

If we have a procedure
ProcLabel [5] Stmt
then it takes five arguments, called (temp 0), (temp 1), ..., (temp 4). Other temporaries can be used without declaration and treated as locals. It executes the statement Stmt.

arrays:

The index of an array is a multiple of 4, starting at 0.

NOOP:

Does nothing.

ERROR:

Terminates the program execution with an error message.

CJUMP SimpleExp Label:

If SimpleExp evaluates to 1, then continue with the next statement, otherwise jump to Label.

HSTORE Temp_1 IntegerLiteral Temp_2:

Temp_1 evaluates to an address
IntegerLiteral is an offset
Temp_2 evaluates to the value that should be stored.

HLOAD Temp_1 Temp_2 IntegerLiteral:

Temp_1 is the temporary into which a value should be loaded
Temp_2 evaluates to an address
IntegerLiteral is an offset.

LT: is "<"

HALLOCATE SimpleExp:

SimpleExp evaluates to an integer, and that corresponding amount of heapspace is allocated, and the return address of that is returned as the result.

sminiIR

grammar

sminiIR.jj

programs

sminiIR SPECIFICATION

sMiniIR (Simplified Minijava Intermediate Representation) is an intermediate representation for Minijava. It is similar to miniIR, but allows constants at more number of places than miniIR. Every miniIR program is also an sminiIR program.

SSAminiIR

grammar

sminiIR.jj

SSAminiIR SPECIFICATION

SSAMiniIR (SSA commented Minijava Intermediate Representation) is an intermediate representation for Minijava. It is similar to miniIR, but it has an additional comment on the number of phi nodes.

simpleCG

grammar

simplecg.jj

sample-call-graphs

simpleCG SPECIFICATION
simpleCG contains the call graph information (the callee name) for each procedure.

aminiIR

grammar

aminiIR.jj

programs

aminiIR SPECIFICATION

aMiniIR (Annotated Minijava Intermediate Representation) is MiniIR code with annotations. The annotations can be of the form

L1 TEMP 20 alias? TEMP 0

Translation: at Label L1, is TEMP 20 a may alias to TEMP 0?

miniRA

grammar

miniRA.jj

programs

miniRA SPECIFICATION

MiniRA is a MIPS-oriented language. It is a lot like microIR, but with the following changes:

1) MiniRA supports C-style comments.

2) Labels now have global rather than local scope.

3) Instead of an unbounded number of temporaries with local scope, MiniRA has 24 global machine registers with global scope. The registers s0-s7 and t0-t9 can be allocated for general use. Registers a0-a3 are reserved to pass arguments to a procedure call. Register v0 is reserved to return a result from a procedure call, and v0 and v1 are also used as temporary registers when values need to be loaded from the stack. s0-s7 are callee saved. t0-t9 are caller saved.

4) Values can be loaded from and stored to the stack with the ALOAD and ASTORE instructions, where (SPILLEDARG i) denotes the ith value on the stack, with the first value at (SPILLEDARG 0). For example:

ALOAD s3 (SPILLEDARG 1)

loads the second value from the stack into register s3.

5) A procedure body is no longer a StmtExp but a StmtList. The return value should be put in register v0.

6) Call is now a statement. As mentioned above, the "a" registers are used to send arguments. If there are more than 4 arguments to the call, you need to use the PASSARG stmt, which saves the extra args to the stack. For whatever reason, PASSARG starts at 1, but SPILLEDARG starts at 0, so in general an argument passed as (PASSARG i) is accessed in the body of the procedure as (SPILLEDARG i-1).

Here's an example. Consider a call to some procedure with label P and arguments stored in registers t1, t2, t3, t4, and t5, and where the return value should go in t6.

MOVE a0 t1 // first move 4 args to the "a" registers
MOVE a1 t2
MOVE a2 t3
MOVE a3 t4
PASSARG 1 t5 // if there are more args save them to the stack.
// note that PASSARG is 1-based, not 0-based!
CALL P
MOVE t6 v0 // the return value is in v0

7) A procedure has three integers in its header, for example:

procA [5] [3] [4]

The first integer has the same meaning as the integer in microIR, i.e. the number of args taken by the procedure.

The second integer is for the number of stack slots that the procedure requires. This is the total number, including space for arguments (if necessary), space for any spilled temps, and space for any registers that have to be saved.

The third integer is the maximum arguments of a call in the body of procA. i.e. if procA makes a call to procB that takes 3 args, to procC that takes 2 args, and to procD that takes 4 args, then since 4 is the maximum args a call in the body of procA uses, this integer is set to 4.

MIPS Assembly

grammar

mips.jj

programs

MIPS SPECIFICATION

[ Taken from Yannis Smaragdakis] The MIPS registers of interest together with the conventions of their uses are:

$0 : Is always set to 0
$t0 - $t9 : Are not persisted after call
$s0 - $s7 : Are persisted after call
$v0 : Used for the return value
$v1 : Use it conventionally for copying to and from the stack
$a0 - $a3 : Used to pass arguments
$sp : Pointer to the top of the stack
$ra : Return address

Some examples from MIPS code:

To insert the value from $t0 to the top of the stack:

sw $t0, 0($sp)

add $sp, $sp, -4

To add $t0, $t1 and produce the result in $t2:

add $t2, $t0, $t1

To multiply $t0, $t1 and produce the result in $t2:

mult $t0, $t1

mflo $t2

To move from one point of the stack to another (x = y where variable x has been assigned during register allocation to the 3rd position from the top of the stack, and y has been assigned the 6th position, counting backwards) under our convention for v1:

lw $v1, -24($sp)

sw $v1, -12($sp)

To do a function call:

jal foo

To return from a function call:

jr $ra

To save a long integer (>32768) to $t0, you do:

li $t0, 1000000

The "MOVE t2 Fac_ComputeFac" of miniRA would then become:

la $t2, Fac_ComputeFac

To translate the miniRA "CALL t0 " you will use the pseudo-instruction jalr:

jalr $t0

You can figure out the rest for yourselves. A very good reference is here. A difference from the standard MIPS conventions of register usage is that you can use $v1 for transfers from the stack. Before a function call you should save all (live) $t0-$t9 variables as well as $ra to the stack. On the above site you will find some MIPS pseudo-instructions that will be translated into simpler binary code by the assembler. You can use these pseudo-instructions freely.

You can execute your MIPS code in SPIM. SPIM is a MIPS emulator that offers some very simple system calls. You will need calls for memory allocation, integer printing and character printing.

To do a memory allocation of 40 bytes and get the address of the allocated memory in $v0:

add $a0, $0, 40

add $v0, $0, 9

syscall

To call the function that prints an integer from $t4:

move $a0, $t4

add $v0, $0, 1

syscall

To call a function that prints a line feed:

add $a0, $0, 10

add $v0, $0, 11

syscall

The following is an example of MIPS code that runs on SPIM. Given this C program, a possible translation to MIPS code is here. The same code with "dynamic" calls is here. The print_int and print_char functions are identical to those offered by SPIM through syscall. (The C program is given in this form to make its translation more understandable.)

Details about the MIPS assembly can be found at spim homepage.
SPIM Quick reference can be found here.
Quick details about MIPS integer instruction set can be here.
A Java based MIPS simulator can be found here.