Compilation Principle: Special Topic 4 Operator-First Grammar Analysis Design Principle and Implementation

Article directory

    • 1. Experimental purpose
    • 2. Experimental requirements
    • 3. Program implementation
      • 3.1. Introduction to relevant environments
      • 3.2. Main data structures
      • 3.3. Program structure description
        • 3.3.1. Design method
        • 3.3.2. Function definition
    • 4. Program testing
    • 5. Experiment summary
      • 5.1. Technical difficulties and solutions
      • 5.2. Experimental thoughts and experience summary

1. Experiment purpose

Through experiments, master and implement the operator priority analysis algorithm, and complete the operator priority analysis process of operator priority grammar of arithmetic expressions.

2. Experimental requirements

(1) Construct the precedence relation matrix or precedence function of the operator precedence grammar;
(2) The input string should be the output binary sequence of lexical analysis, that is, the output result of an arithmetic expression “Topic 1”. The output is the judgment result of whether the input string is an arithmetic expression defined by the grammar.
(3) The operator priority analysis process should be able to detect errors in the input string.
(4) Design two test cases (as complete as possible, correct and wrong), and give the test results;
(5) Consider constructing the operator precedence relation matrix according to the operator precedence grammar, including the FIRSTVT and LASTVT sets, and add it to your operator precedence analysis program.

3. Program implementation

3.1. Introduction to relevant environments

Operating system: window 10 21H2
Development environment: Clion-2022.2.1-Windows
Compiler: mwing-10.0

3.2. Main data structure

Mainly to save word information, a struct is created

01:struct Keyword{<!-- -->
02: string notation;
03: int class_num;
04: int line;
05: Keyword(string str, int num, int line_){<!-- -->
06: notation = str;
07: class_num = num;
08: line = line_;
09: }
10: Keyword(char* str, int num, int line_){<!-- -->
11: notation = string(str);
12: class_num = num;
13: line = line_;
14: }
15: Keyword(char str, int num, int line_){<!-- -->
16: notation = str;
17: class_num = num;
18: line = line_;
19: }
20:};

Among them, notation is the value of the word, class_num is the category to which the word belongs, and line is the line number of the word in the source program.
Using the map container, a two-dimensional array with string as the index is created.

The following is the relevant data structure based on stl

21:map<string, map<string, int>> analyzer_table;
22:map<string, set<string> > firstVt;
23:map<string, set<string> > lastVt;
24:map<string, string> Vn;
25:map<string, string> Vt;
26:
27:map<string, vector<string>[ORMAXFORGRAMMAR] > grammar;
28:map<pair<string,vector<string>>, set<string>> first_alpha;
29:
30:vector<string>leftest_increase;
31:stack <string> analysis_stack;

annalzer_table is an analysis table, including the priority relationship of all Vt symbols. At the same time, through the extension of the grammar, I directly obtained the priority relationship of arithmetic grammar to # from the original algorithm.
firstVt and lastVt are the corresponding sets holding all Vn symbols. Annlysis_stack is an analysis stack structure, which is widely used in analysis table driver functions. leftest_increase is a record structure of analysis stack pop elements. It is mainly used to determine whether the phrase obtained through the analysis table and analysis stack program is the leftest phrase. This will It will be associated with the grammar structure Vt and Vn sets.
Using the map structure to replace the string structure with an index-like table structure is the result of Experiment 3. The map structure is used here to complete the machineization of using Vt or Vn symbols as the two-dimensional array index taught in the ppt.

3.3. Program structure description

3.3.1. Design method

Considering the versatility of the program, all input streams of the program are txt files. First read the grammar file, which includes the definition of the grammar and Vn and Vt sets. After that, create the Vn set Vt set in the program, and then find the firstVt set lastVt set, and build a priority relationship analysis table of all Vt on this basis.
Next is the linkage with the lexical analysis program. This program can directly call the lexical analysis program, so the test sample can directly prepare the source code for analysis. Of course, the output of the grammatical analysis part is still the result of lexical analysis, which is a binary expression. At the same time, for the convenience of error reporting, this program adds a line number to the binary expression and turns it into a ternary expression. The corresponding config configuration dependency files refer to Experiment 1, including various file dependencies.

3.3.2. Function definition

create_Vn_Vt_grammar(grammar_path); Create Vn Vt set and grammar table.
create_firstVt(); Create firstVt set.
create_lastVt(); Create lastVt set
create_analyzer_table(); Create analysis table
analyzer_stack(need_to_analysis); Analysis stack entry, that is, analysis driver.

int correct_prom() statement conforms to the grammar prompt.
int error_prom() statement is not grammatical.

There are also a number of auxiliary functions, not shown.

The preparation of the main functions is described in the course.

The main difference from the classroom teaching is the analysis of stack functions. We combined the program flow chart taught in the classroom and the data structure based on stl to delete the goto statement and added an identifier to determine the direction of the transfer. Compared with the ppt program flow chart in the classroom, it is different.

The search and error reporting in the right part of the production of the specification are relatively original. The principle is violent traversal.

4. Program testing

Alu_test1.txt
This file contains 6 test samples.

32:
33:(a + c * b #
34:a + b #
35:(a + c b ) #
36:23434 + 34324 + (a* b / c) + (a*b/ (2-2)) #
37:( a + b c ) * 32 / a #
38:a- b * #

Since we start with lexical analysis, the output of the intermediate results is given below:

Chart 4 1 Lexical analysis results in test1

Chart 4 2 Symbol class number file (part of the output of the lexical analyzer)

Chart 4 3config file
It includes the paths of a number of dependent files and the default file path to be analyzed, with the writing format at the bottom.

Chart 4 4 grammar file
Grammar files are mainly Vn Vt sets and grammar input. Special instructions are below.

The output of the lexical analysis program still uses triples, with an extra line number to facilitate error prompts. For more, we directly use the Keyword structure, which has no impact on the results.

Chart 4 5 Results of this program, information prompts
It can be seen from the information prompts that the program completed the four test samples very well and the results were correct.
In the equation, numeric constants can appear in the original grammar identifier position of the experimental instructions. This is an extension of grammar.

5. Experiment summary

5.1. Technical difficulties and solutions

The important block diagram of the experiment itself has been taught in class. There were many problems mainly in the code writing process, but fortunately they were all solved in the end.

After writing the experimental code, I suddenly discovered that the grammar of the experimental guidance is still somewhat different from the grammatical logic of normal programming languages. i is defined as an identifier, that is, a variable. The position where i appears in an algorithm expression can be an identifier, a numeric constant, or even an expression. Here we have made a little expansion based on the original grammar and added a new Vt symbol of n as a proxy for numeric constants. So our grammar becomes:

Fortunately, after modifying the grammar, my experiment 3 program had no problems at all, which is really gratifying.

E' -> # E #
E -> E + T | E - T | T
T -> T * F | T / F | F
F -> ( E ) | i | n

Of course, we can also expand i into an expression, which is actually a recursive call.

5.2. Experimental thoughts and experience summary

The main coding difficulty in this experiment lies in the establishment of data structures. The two-dimensional array indexes in C++ are all integers, and the analysis tables and other tables taught in class all use characters as indexes, so the conversion is difficult. Finally, I thought of using the map container in STL as the data structure. The two-dimensional array is a map within a map, and vector is used to store the grammar production. This will lead to the seemingly complicated container embedding in the “Data Structure” section. set of structures. After the data structure is finalized, the basic operation method is defined, and the algorithm is used as taught in the course, and the experiment is not difficult to complete.

The algorithm flow should be coordinated with the data structure. The algorithm flow chart explained in the class can be written directly and even the goto statement can be written directly. The original flow chart must be analyzed to obtain the algorithm principle, and then a new algorithm flow chart can be given based on the new data structure, and then the code can be written.

Clion debug variable monitoring, problems will occur if monitoring is started before the variable is created.