Implement TINY+ compiler based on C language100010897

Implement the TINY+ compiler

Experiment overview

【Purpose of experiment】

Construct a compiler that converts TINY+ high-level programming language to intermediate code on TINY+ virtual machine.

The whole experiment consists of two parts: Experiment 1 completes the lexical analyzer part of TINY + compiler; Experiment 2 completes the syntax analyzer part, semantic analyzer part and intermediate code generator part of TINY + compiler.

【Experimental Environment】

Microsoft VisualStudio 2015

C++ language implementation

Experiment content

Experiment 1, implement lexical analyzer of TINY + compiler

【Experimental purpose and requirements】

In this lab, students construct a lexical analysis program for the TINY+ language. Specific requirements are as follows:

1 The input of the lexical analyzer is the source program file, and the output is a stream of words (tokens).

2 The constructed lexical analyzer must follow the principle of the longest substring, for example, the string ‘:=’ should be recognized as a word representing the assignment symbol, instead of two words ‘:’ and ‘=’.

3 The words recognized by the lexical analyzer should be expressed in the form of a two-tuple (Kind, Value), where Kind indicates the type of the word, and Value indicates the actual value of the word. The following symbols are required to represent different kinds of words:

KEY represents a keyword;

SYM means a special symbol;

ID stands for identifier;

NUM represents a numeric constant

STR represents a string constant

4 The task of the lexical analyzer is not only to complete the recognition of words, but also to check the lexical errors in the program, give the error information and the location of the error in the source program (the line number where it is located). Types of lexical errors include:

Illegal symbols, the lexical analyzer may identify symbols that are not allowed in the alphabet of a TINY+ program, for example, the lexical analyzer recognizes $ in a source program, it should report a lexical error, an illegal symbol was found;

Word STRING of type string, single quotes do not match. For example, if ‘scanner appears in the source program, the lexical analysis analyzer should report an error “the string is missing a closing quotation mark”;

Mismatched parentheses for comments. For example, if {this is an example appears in the source program, the lexical analyzer should report an error “The comment is missing a closing parenthesis”.

Algorithm analysis experiment process:

1. Set NO_ANALYZE, NO_CODE, and PARSE to TRUE in the main function, so that the program only performs lexical analysis.

2. Due to the addition of keywords, the TINY language keywords are as follows:

Tiny keywords	if, then, else, end, repeat, until, read, write
Tiny + new keywords	Or, and, int, bool, char, while, do

All the keywords are reserved for the programming language and expressed in lowercase letters, and the identifiers defined by the user cannot be repeated with the keywords.

3. Definition of special symbols:

Tiny symbols	{ } ; strong> := + – / ( )< /strong> < =**
Tiny + new symbols	> <= >= , ‘

4. The words recognized by the lexical analyzer are expressed in the form of a two-tuple (Kind, Value), where Kind indicates the type of the word, and Value indicates the actual value of the word. The following symbols are required to represent different kinds of words:

KEY	keyword
SYM	Special symbols
ID	Identifier
NUM	number Constant
STR	String constant

5. The task of the lexical analyzer is not only to complete the recognition of words, but also to check the lexical errors in the program, give the error information and the location of the error in the source program (the line number where it is located). Types of lexical errors include:

Mismatched parentheses for comments. For example, if {this is an example appears in the source program, the lexical analyzer should report an error “The comment is missing a closing parenthesis”.

Experimental process:

Add the corresponding type to the TokenType enumeration in GLOBALS.h:

keywords	special symbols
Add an enumeration definition	Corresponding keywords	Add an enumeration definition	Corresponding to special symbols
OR	or	RT
AND	and	LTEQ	<=
INT	Int	RTEQ	>=
BOOL	bool	COMMA	,
CHAR	char	CO	‘
WHILE	while
DO	do

Make corresponding modifications in the getToken() function of SCAN.C, so that the program can correctly identify the newly added keywords and special symbols, and give corresponding prompts for program errors)

Lexical analysis test results

Test One:

Input:

output:

Test Two:

enter:

output:

Test 3: (test error prompt)

Input:

output:

Experiment 2: Realize TINY + syntax analyzer, semantic analyzer and intermediate code generator

*【Experimental purpose and requirements】

In this experiment, students construct TINY + language syntax analyzer, semantic analyzer and intermediate code generator. The specific requirements are as follows:

1 Construct a recursive descent parser for TINY+. The syntax analyzer performs syntax analysis on the word sequence generated by the lexical analyzer, and generates an abstract syntax tree as the output of the syntax analyzer.

2 Construct TINY+’s semantic analyzer, which builds symbol tables and then checks programs for semantic errors.

3 Construct the TINY+ intermediate code generation department, which translates the TINY+ program into three-address intermediate code.

4 It is required to be able to check for grammatical and semantic errors in the program, including:

a) Syntax error:

The start symbol and following symbol of the grammatical structure are wrong;

Identifier errors, for example, in a variable declaration of a program, the keyword int is not followed by an identifier;

Errors for mismatched parentheses, such as mismatched opening parenthesis (and closing parenthesis ).

Symbol error, for example, the correct symbol required in an assignment statement is ‘:=’, but the correct symbol required in a relational comparison expression is ‘=’.

b) Semantic errors:

An identifier is used without being declared, and an identifier is declared more than once.

The type of a conditional expression is not boolean type bool.

The types of the two operands of a binary operator are not equal.

The types of the left and right parts of an assignment statement are not equal.

Grammar analysis experiment process:

1. Set NO_ANALYZE and NO_CODE to TRUE in the main function, so that the program only performs PARSE (syntactic analysis) process.

2. The syntax EBNF of Tiny+ is defined as follows:

1rogram -> declarations stmt-sequence
2declarations -> dec; declarations |ε
3decl -> type-specifier varlist
4type-specifier -> int | boo | char
5varlist -> identifier { , identifier }
6stmt-sequence -> statement { ; statement }
7statement -> if-stmt | repeat-stmt | assign-stmt | read-stmt | write-stmt |
8while-stmt -> while bool-exp do stmt-sequence end
9if-stmt -> if bool-exp then stmt-sequence [else stmt-sequence] end
10repeat-stmt -> repeat stmt-sequence untibool-exp
11assign-stmt -> identifier:=exp
12 read-stmt -> read identifier
13write-stmt -> write exp
14exp -> arithmetic-exp | bool-exp | string-exp
15arithmetic-exp -> term { addop term }
16addop -> + | -
17term -> factor { mulop factor }
18mulop -> * | /
19factor -> (arithmetic-exp) | number | identifier
20 bool-exp -> bterm { or bterm }
21 bterm -> bfactor { and bfactor }
22 bfactor -> comparison-exp
23 comparison-exp -> arithmetic-exp comparison-op arithmetic-exp
24 comparison-op -> < | = | > | >= | <=
tring-exp -> string

3. According to the definition of EBNF, modify the PARSE.C file, construct a recursive descent parser, and generate a syntax tree.

The structure of a basic syntax tree is as follows:

sentence series		The statements are connected through a linked list, except for the first statement, other statements can be found through ->slibing of the previous statement.
If statement		test is located at child[0] of If statement, then-part is child[1], else-part is child[2]
Repeat		Body is located in child[0] in the repeat statement, test is located in chlid[1]
Assign		Expression is located in assign statement child[0]
write		Expression in child[0] of write
Op logic operation		Left-operand is located in the child[0] of the op statement, and Right-opreand is located in the child[1] of the statement;
while		Test is located in child[0]Do-part of while statement is located in child[1]
Define such as: int A, B		B is in the slibing of A, Among them, the child nodes of A and B are not assigned (this is reserved for future expansion)

Creates a new node type of define that identifies definition statements.

Rewrite the unsuitable grammatical analysis tree branch in tiny, and write the appropriate analysis function according to the new ENBF definition, so as to construct the correct grammatical tree.

Syntactic analysis test results:

Test one:

enter:

output:

Semantic analysis experiment process:

1 Set NO_CODE to TRUE in the main function to make the program perform the semantic analysis process.

2 A program written in TINY + language includes two parts: variable declaration and statement sequence. The variable declaration section can be empty, but a TINY+ program must contain at least one statement.

a) All variables must be declared before use, and each variable can only be declared once.

b) The types of variables and expressions can be integer type int, Boolean type bool or string type char, and type checking must be performed on the use of variables and expressions.

3 Building a semantic analyzer mainly includes

The design of the symbol table and the corresponding operation of the symbol table

The symbol table saves information: the address information of the variable, the line number where the variable is accessed
The variable uses its variable name ID to find the corresponding data through chain hash storage.

The operation of the semantic analysis program itself, including the construction of the symbol table and type checking.

Creation of the symbol table:

Preorder traversal of the syntax tree:

When encountering a defined grammar node, add its variable to the hash table and store its line number position;

When encountering a node with a non-definition statement containing a variable, look up the position of the variable in the hash table, and store the line number in the next position of the corresponding entry of the hash.

Type checking:

Subsequent traversal of the syntax tree, the semantic judgment of each node:

1. In the if statement node, the type of if-test(child[0]) must be boolean,

In the while statement node, the type of while-test(child[0]) must be boolean,
In the op arithmetic operation node, child[0] and child[1] must be of integer type
In the logical operation node of op, child[0] and child[1] must be boolean
In the assignment operation node, the types of child[0] and child[1] must be consistent
…

In the subsequent traversal of the syntax tree, check whether the variable is illegally used:

When the node is an ID node, check whether it has been defined in the hash table.
If the node is a definition statement node, it is judged whether the variable is defined repeatedly.
If the variable has already been defined, assign the type of the variable to the type at the time of definition
Enter the previous layer traversal, and when the type does not meet the type required by the previous layer, an error will be prompted. (That is, if in the assignment statement, the type of the variable is inconsistent with the type of another child node of the assignment statement, it will prompt a semantic error).

Lexical analysis test results

Test one:

Input:

output:

Test Two:

enter:

output:

Test Three: (Test Error Prompt)

Input: There are three errors in the code:

a) A repeated definition

b) In A>3, A is char type, the comparison is wrong

c) Assignment error in A:=1. Type inconsistency

output:

Intermediate code generation experiment process:

1 Set NO_PARSE, NO_ANALYZE, and NO_CODE to FALSE in the main function to make the program perform the intermediate code generation process.

2 Construct the intermediate code generation department of TINY+, which translates the TINY+ program into three-address intermediate code.

3 The implementation of the original tiny intermediate code generator is located in cgen.c and cgen.h. Its interface is codeGen(TreeNode* syntaxTree, char* codefile).

The main functions are:

The start of the function, generating some comments and instructions for setting up the runtime environment.

Call the cGen function on the syntax tree to generate corresponding instructions for each syntax tree node

Finally, a HALT instruction is generated, ending object code generation.

Comments and instructions are written in a specified .tm file for the TM program to call and run.

4 Tiny + Modified on the basis of tiny. The codeGen function will remain unchanged, just modify its cGen function.

Modify the getExp() function called by the cGen function.

Add the type of OpK, that is, on the basis of the original function code, add the intermediate code generation step of the new symbol and its operation. That is, the new symbols RT, EQ, LTEQ, RTEQ.

Such as adding “>” to the intermediate code generation process

Increase the processing of LogicOpK (that is, increase the generation of intermediate codes for AND and OR)

The intermediate code generation here is much more complex than the intermediate code of arithmetic operations. According to the instructions of the TM program and its operating principle, write the AND and OR intermediate code generation functions:

5 Modify the getStmt() function called by the cGen function. Added intermediate code generation for the while statement.

Here, Tiny + grammar rules are used to generate the corresponding three-address intermediate code attribute, and the function of while to generate intermediate code is completed.

Intermediate code test results

Test One:

enter:

output:

Resources

Size: 3.34MB
Resource download: https://download.csdn.net/download/s1t16/87484800
Note: If the current article or code violates your rights, please private message the author to delete it!