[Compilation Principle] LR(1) Analysis Method: C/C++ Implementation

Personal homepage:Sarapines Programmer
Series of columns: “Adventures in Compilation Principles”
The fragrance of ink sent a clear message: The wind is blowing in the empty valley, and the crane and the moon are bright in the dream. The passion remains after thousands of years, and the determination to soar in the wind remains unwavering.

Directory structure

1. The concept of LR(1) analysis method of compilation principle

1.1 Compilation principle

1.2 LR(1) analysis method

2. Generation and calculation of reverse Polish expression

2.1 Experimental purpose

2.2 Experimental requirements

2.3.1 Algorithm flow chart

2.3.2 Reference program code

2.3 Experimental content

2.3.1 Experimental solution code

2.3.2 Program explanation

2. Experimental experience

3. To all of you


1. Concept of LR(1) Analysis Method of Compilation Principles

1.1 Compilation Principle

Compilation principle is an important branch in the field of computer science. It studies the process of converting source code of high-level programming languages into machine code or intermediate code that can be executed by the computer. Compilation principles cover the design and implementation of a compiler, a software tool that translates source code into object code. The main tasks of the compiler include syntax analysis, lexical analysis, semantic analysis, optimization and code generation.

1.2 LR(1) Analysis

LR(1) (Left-to-Right, Rightmost derivation with 1 symbol lookahead) analysis method is a syntax analysis method used to build a parser. It is usually used to analyze the grammatical structure of context-free grammar and is a type of LR analysis method. varieties. It is a powerful bottom-up grammar analysis method suitable for context-free grammars with a certain complexity. By using lookahead symbols to handle ambiguities in the grammar, the input can be analyzed and understood more accurately.

Resource acquisition: Follow the public account at the end of the article to reply to the LR(1) analysis method source code

2. Generation and calculation of reverse Polish expression

2.1 Experimental Purpose

(1) Construct an LR(1) analysis program and perform grammatical analysis to determine whether the given symbol string is a sentence recognized by the grammar;

(2) Understand that the LR(K) analysis method is a strict left-to-right scanning and bottom-up syntax analysis method.

2.2 Experimental Requirements

1. For the following grammar, use LR(1) analysis method to analyze any input symbol string:

(0)E->S

(1)S->BB

(2)B->aB

(3)B->b

2. The LR(1) analysis table is:

(1)If input

baba#

The output is:

(2)If you enter

bb#

The output is:

2.3.1 Algorithm flow chart

2.3.2 Reference Program Code

Reference code (incomplete):

/* ACTION table*/
char *action[10][3]={"S3#","S4#",NULL,

NULL,NULL,"acc",
"S6#","S7#",NULL,
"S3#","S4#",NULL,
"r3#","r3#",NULL,
NULL,NULL,"r1#",
"S6#","S7#",NULL,
NULL,NULL,"r3#",
"r2#","r2#",NULL,
NULL,NULL,"r2#"};
/*GOTO table*/
int goto1[10][2]={1,2,
0,0,
0,5,
0,8,
0,0,
0,0,
0,9,
0,0,
0,0,
0,0};
char vt[3]={'a','b','#'}; /*Storage terminator*/
char vn[2]={'S','B'}; /*Storage non-terminal symbols*/
char *LR[4]={"E->S#","S->BB#","B->aB#","B->b#"}; /* Store production */
/*Output status stack, output symbol stack, output input string*/
do{ y=z;m=0;n=0; /*y,z points to the top of the status stack*/
               g=top;j=0;k=0;
              x=c[top];
              count + + ; printf("%d\t",count);
             while(m<=top1) /*Output status stack*/
                 {printf("%d",a[m]); m=m + 1; } printf("\t\t");
            while(n<=top2) /*Output symbol stack*/
                 {printf("%c",b[n]); n=n + 1; } printf("\t\t");
            while(g<=top3) /*Output input string*/
                 {printf("%c",c[g]); g=g + 1; } printf("\t\t");
     …
   }while(action[y][j]!="acc");

/*Check action table*/
if (x= ='a')j=0;
if (x= ='b')j=1;
if (x= ='#')j=2;
if(action[y][j]==NULL)
      { printf("error\\
"); return; }
else
             strcpy(copy,action[y][j]);

/*Process the move*/
if(copy[0]=='S')
       { z=copy[1]-'0';
          top1=top1 + 1; top2=top2 + 1; a[top1]=z; b[top2]=x; top=top + 1; i=0; while(copy[i]!='#')
              {printf("%c",copy[i]);i + + ;} printf("\\
");
        }

/*Process reduction*/
if(copy[0]=='r')
     { i=0;
          while(copy[i]!='#')
                 { printf("%c",copy[i]); i + + ; }
                    h=copy[1]-'0';
          strcpy(copy1,LR[h]);
          if (copy1[0]= ='S')k=0;
          if (copy1[0]= ='B')k=1;
          l=strlen(LR[h])-4;
          top1=top1-l + 1; top2=top2-l + 1; y=a[top1-1];
          p=goto1[y][k]; a[top1]=p; b[top2]=copy1[0]; z=p;
          printf("\t\t");
          printf("%d\\
",p);
     }

2.3 Experimental Content

2.3.1 Experimental solution code

Modify the reference code as follows:

#include<stdio.h>
#include<string.h>
#include<stdlib.h>

/* ACTION table */
char *action_table[10][3]={"S3","S4",NULL,
NULL,NULL,"acc",
"S6","S7",NULL,
"S3","S4",NULL,
"r3","r3",NULL,
NULL,NULL,"r1",
"S6","S7",NULL,
NULL,NULL,"r3",
"r2","r2",NULL,
NULL,NULL,"r2"};

/*GOTO table*/
int goto_table[10][2]={1,2,
0,0,
0,5,
0,8,
0,0,
0,0,
0,9,
0,0,
0,0,
0,0};
char vt[3]={'a','b','#'}; /*Storage terminator*/
char vn[2]={'S','B'}; /*Storage non-terminal symbols*/

struct Production{ //Production type definition
  char alp; //uppercase characters
char array[10]; //Character on the right side of the production
int length; //Number of characters
};
Production production[4]; //Storage production

int statueStack[10]; //status stack
char symbolStack[10]; //symbol stack
char input_string[10]; //Input string

int statueStackPointer = -1; //pointer of status stack
int symbolStackPointer = -1; //pointer of symbol stack
int inputStringPointer = 0; //Input string pointer
int input_string_length = 0; //The length of the input string
int process = 1;

bool input_judge();
void printAll();
void analyze();
void init();

int main(){
bool input_right = false;
while(!input_right){
input_right = input_judge();
}
init();
printf("Step\t\tStatus stack\t\tSymbol stack\t\tInput string\t\t");
printf("Action\t\t");
printf("Goto\\
");
printAll();
analyze();
}

void printAll(){
printf("%d\t\t",proce + + );
\t\t
for(int i=0; i<=statueStackPointer;i + + ){ //Output status stack
printf("%d",statueStack[i]);
}
printf("\t\t");
\t
for(int i=0; i<=symbolStackPointer;i + + ){ /*Output symbol stack*/
printf("%c",symbolStack[i]);
}
printf("\t\t");
\t
for(int i=inputStringPointer; i<input_string_length;i + + ){ /*Output input stack*/

printf("%c",input_string[i]);
}
printf("\t\t");
}

void analyze(){
int stop = 0;
while(!stop){
char store[10];
char input = input_string[inputStringPointer]; //Get the first character of the input string
int col = -1;
int row = statueStack[statueStackPointer];
\t\t
/*Check action table*/
if(input=='a'){
col=0;
}
if(input=='b'){
col=1;
}
if(input=='#'){
col=2;
}
if(action_table[row][col]==NULL){
printf("err\\
");
exit(0);
}

else{
strcpy(store,action_table[row][col]);
}

if(strcmp(store,"acc")==0){
printf("acc\\
");
stop = 1;
break;//end
}
\t\t
/*Move in*/
else if(store[0]=='S'){
statueStack[ + + statueStackPointer]=store[1]-'0';
symbolStack[ + + symbolStackPointer]=input_string[inputStringPointer];
inputStringPointer + =1;
printf("%s\\
",store);
printAll();
}
\t\t
\t\t
/*reduce*/
else if(store[0]=='r'){
int col = -1;
int index=store[1]-'0';
if(production[index].alp=='S'){
col=0;
}
else if(production[index].alp=='B'){
col=1;
}
else{
printf("err");
exit(0);
}

int length = production[index].length; //Character length of expression
statueStackPointer-=length; //Status stack pop
symbolStackPointer-=length; //pop symbol stack
int row = statueStack[statueStackPointer]; //Get the top status of the status stack after the pop operation
statueStack[ + + statueStackPointer]=goto_table[row][col]; //The state in the goto table enters the state stack
symbolStack[ + + symbolStackPointer]=production[index].alp; //The uppercase letters on the left side of the expression enter the symbol stack
printf("%s\t\t",store);
printf("%d\\
",goto_table[row][col]);
printAll();
}
else{
printf("err!\\
");
exit(0);
}
}
}

void init(){
struct Production a1,a2,a3,a4;
a1.alp = 'E';
strcpy(a1.array,"S");
a1.length = 1;
\t
a2.alp = 'S';
a2.length = 2;
strcpy(a3.array,"BB");
\t
a3.alp = 'B';
a3.length = 2;
strcpy(a3.array,"aB");
\t
a4.alp = 'B';
a4.length = 1;
strcpy(a4.array,"b");
\t
production[0] = a1;
production[1] = a2;
production[2] = a3;
production[3] = a4;
\t
statueStack[ + + statueStackPointer] = 0; //0
symbolStack[ + + symbolStackPointer] = '#';//#
}

bool input_judge(){
int j=0;
char ch;
printf("Input:");
while(true){
ch = getchar();
if(ch=='\\
'){
break;
}
input_string[j + + ] = ch;
    }
\t
for(int i=0; i<j; i + + ){
ch = input_string[i];
if(!((ch=='a')||(ch=='b'))){
printf("Incorrect input\\
");
return false;
}
}
\t
input_string[j + + ] = '#';
input_string_length = j;
return true;
}

Enter the correct input string:

abab

Run results:

Incorrect input string:

abbaab

Run results:

Program Analysis:

1. The ACTION table and GOTO table are defined for the shift and reduction operations of the LR analyzer. The ACTION table and GOTO table are represented by two-dimensional arrays. Each element corresponds to a state and terminal (ACTION table) or non-terminal. symbol (GOTO table), which stores the corresponding operation information.

2. Define some variables and data structures, including production structure struct Production, status stack statusStack, symbol stack symbolStack, input string input_string, etc. These data structures and variables are used to hold intermediate states and results during analysis.

3. The printAll function is used to print the status stack, symbol stack, input string and other information of the current analysis step. After each step of analysis is completed, call this function to print out the current status.

4. The analyze function is the core part of LR analysis. The shift and reduction operations are continuously performed through a while loop until the termination condition is reached. In each step, the corresponding operation is searched in the ACTION table according to the input characters, and the corresponding shift or reduction operation is performed. If an error condition is encountered, an error message is output and the program exits.

5. The init function is used to initialize the state stack and symbol stack of the LR analyzer, and defines the grammar production. In this function, four productions are first defined and stored in the production array. Then the initial state 0 is pushed onto the status stack, and the terminal symbol # is pushed onto the symbol stack.

6. The input_judge function is used to obtain the input string and check the input validity. In the function, the characters are first read one by one through the getchar function and stored in the input_string array. The reading process will check the legality of the characters, and only the terminal symbols a and b are allowed. Otherwise, an error message will be output and false will be returned. Finally, the terminator # is added to the end of the input string and the length of the input string is updated.

7. In the main function, first judge the legality of the input string, then initialize it, and call the printAll function to print the initial status. Then call the analyze function to perform LR analysis, and use a while loop to ensure that the input string is legal, that is, call the input_judge function to determine whether the input string meets the requirements. Only when the input string is valid will the subsequent initialization and analysis process be carried out.

The key functions of LR analysisanalysefunction:

void analyze(){
int stop = 0;
while(!stop){
char store[10];
char input = input_string[inputStringPointer]; //Get the first character of the input string
int col = -1;
int row = statueStack[statueStackPointer];
/*Check action table*/
if(input=='a'){
col=0;
}
if(input=='b'){
col=1;
}
if(input=='#'){
col=2;
}
if(action_table[row][col]==NULL){
printf("err\\
");
exit(0);
}
else{
strcpy(store,action_table[row][col]);
}

if(strcmp(store,"acc")==0){
printf("acc\\
");
stop = 1;
break;//end
}
\t\t
/*Move in*/
else if(store[0]=='S'){
statueStack[ + + statueStackPointer]=store[1]-'0';
symbolStack[ + + symbolStackPointer]=input_string[inputStringPointer];
inputStringPointer + =1;
printf("%s\\
",store);
printAll();
}
\t\t
\t\t
/*reduce*/
else if(store[0]=='r'){
int col = -1;
int index=store[1]-'0';
if(production[index].alp=='S'){
col=0;
}
else if(production[index].alp=='B'){
col=1;
}
else{
printf("err");
exit(0);
}
int length = production[index].length; //Character length of expression
statueStackPointer-=length; //Status stack pop
symbolStackPointer-=length; //pop symbol stack
int row = statueStack[statueStackPointer]; //Get the top state of the status stack after the pop operation
statueStack[ + + statueStackPointer]=goto_table[row][col]; //The state in the goto table enters the state stack
symbolStack[ + + symbolStackPointer]=production[index].alp; //The uppercase letters on the left side of the expression enter the symbol stack
printf("%s\t\t",store);
printf("%d\\
",goto_table[row][col]);
printAll();
}
else{
printf("err!\\
");
exit(0);
}
}
}

2.3.2 Program explanation

1.int stop = 0; declare and initialize a variable stop, which is used to control the termination condition of the analysis process.

2.while(!stop) is a loop. Only when stop is false, the code in the loop body is executed, that is, the analysis operation is performed.

3.char store[10]; Declare a character array store to store operations found from the ACTION table.

4.char input = input_string[inputStringPointer]; Get the first character of the input string and assign it to the variable input. input_string[inputStringPointer] is used to get the character at the current position from the input string.

5.int col = -1; Initialize the variable col to -1, which is used to record the column number of the current character in the ACTION table.

6.int row = statueStack[statueStackPointer]; Get the top element of the status stack, which is the current status.

7. Determine the value of col according to the value of input, and determine whether the current character is the terminal symbol a, b or the terminal symbol #. Depending on the situation, set the value of col to the corresponding column number.

8.if(action_table[row][col]==NULL) determines whether the corresponding position in the ACTION table is empty. If it is empty, it means there is no corresponding operation, an error message will be output and the program will exit.

9.The else branch is executed when the corresponding operation is found.

10.strcmp(store, “acc”) == 0 Determine whether it is in the acceptance state, that is, complete the syntax analysis and successfully match the input string. If it is in the acceptance state, output “acc”, set stop to 1, and end the loop.

11.else if(store[0] == ‘S’) Determine whether it is a move operation. If it is a move operation, perform the following operations:

  1. statueStack[ + + statueStackPointer] = store[1] – ‘0’; Push the moved state into the state stack.
  2. symbolStack[ + + symbolStackPointer] = input_string[inputStringPointer]; Push the current input character into the symbol stack.
  3. inputStringPointer + = 1; Move the input string pointer backward one bit.
  4. Output the contents of the shift operation.
  5. Call the printAll function to print the status of the current analysis step.

12.else if(store[0] == ‘r’) determines whether it is a reduction operation. If it is a reduction operation, perform the following operations:

  1. int index = store[1] – ‘0’; Get the index of the reduction production.
  2. Determine the column number col based on the left characters of the production.
  3. Gets the length of the reduction production.
  4. The status stack and symbol stack are popped, and the number of items popped is the length of the production.
  5. Get the top status row of the status stack after the pop operation.
  6. Assign `goto_table[row][col] to the state stack to represent the new state after reduction.
  7. Push the left characters of the production onto the symbol stack.
  8. Output the contents of the reduction operation.
  9. Output goto_table[row][col], which is the reduced state.
  10. Call the printAll function to print the status of the current analysis step.

13.else branch indicates an unrecognized operation, outputs an error message and exits the program.

14. In the next iteration of the loop, the analysis process continues until an acceptance state is reached or an error occurs causing the program to exit.

The function analyze implements the shift-reduce algorithm in the LR analysis table. By looking up the operation in the ACTION table, perform the corresponding operation based on the input characters and the current status. The shift operation pushes the state and input characters onto the stack, and the reduction operation pops the stack according to the production and pushes the new state and the left characters of the production onto the stack. This process continues until the status is accepted or an error occurs. The print statements in the function and the call to the printAll function are used to display the status changes and operations at each step.

2. Experimental experience

During the code implementation process of the experiment, the ACTION table and the GOTO table were defined. These two tables are the core parts of the LR(1) analysis table. The ACTION table is used to record move and reduction operations, and the GOTO table is used to record status. transfer between. These tables provide instructions for working with input strings and status stacks. Then the production structure is defined, and variables such as the production array, status stack, symbol stack, and input string are initialized. These variables play a key role in the analysis process.

The main analysis process is implemented in the function analyze(). This function uses a loop to step through the input string until an acceptance state is reached or an error occurs. At each step, the corresponding operation is found in the ACTION table based on the input characters and the current status. If it is a shift operation, push the status and input characters onto the stack and print the status of the current step. If it is a reduction operation, pop the stack according to the production, push the new state and the left character of the production onto the stack, and print the status of the current step. If the operation cannot be recognized, an error message is output and the program exits.

Through this experiment, I implemented the code based on LR(1) analysis method and deeply understood the process and principle of LR(1) analysis method: LR(1) analysis method can handle context-free grammar with a certain complexity. The operations of constructing parsing tables and state stacks are used to gradually analyze and reduce the input string, and ultimately determine whether the input string can be generated according to the given grammar.

3.To everyone

Promise me three hundred years in this world, and eight thousand years on the snowy road