Principle of Compilation – Minimize the Visual Implementation of DFA

Regular expression to minimize DFA (python implementation)

Achieve your goals

  1. Analyze the regular expressions (including basic operations) input by the user to understand the operation
  2. Convert to NFA
  3. Convert to DFA
  4. Minimize DFA
  5. Use minimal DFA to analyze whether the input string matches
  6. Interaction and Visualization
    See the end for the effect.

Building a regular expression process

  1. building the alphabet

? Adaptive with input

  1. Add and operator symbols
# Add AND operator
def inputR():
    var. set(ei. get())
    s = list(var. get(). replace(' ', ''))
    i = 0
    while i < len(s) - 1:
        if s[i] not in ['|', '·', '*', '(', ')'] and s[i + 1] not in ['|', '·', '*', ')']:
            s.insert(i + 1, '·')
            i = 0
        elif s[i] in ['*', ')'] and s[i + 1] not in ['|', '·', '*', ')']:
            s.insert(i + 1, '·')
            i = 0
        i + = 1
    var. set(''. join(s))

  1. Build reverse Polish expressions (postfix expressions)

? Shunting yard algorithm:

  1. Create an operator stack for the storage of operators. This operator follows the principle that the higher the priority toward the top of the stack, the higher the priority.

  2. Scan the expression sequentially, if the current character is a letter (a symbol with a priority of 0), output it directly; if the current character is an operator or bracket (a symbol with a priority other than 0), then judge:

  • ? If the current operator is ‘(‘, push it directly into the stack;
  • ? If it is ‘)’, the stack is popped and the operators are output sequentially until the first ‘(‘ is encountered, and the first ‘(‘ encountered is popped but not output;
  • ? If it is other, compare the priority of the top element of the operator stack with the current element:

? a. If the top element of the stack is ‘(‘, the current element is directly pushed onto the stack;

? b. If the priority of the top element on the stack >= the priority of the current element, pop the stack and output the operators sequentially until the priority of the top element

? c. If the priority of the top element of the stack

  1. Repeat the above operation until the expression is scanned

  2. Sequentially pop and output operators until the stack element is empty

# From infix expressions to reverse Polish expressions
def infixToSuffix(infix): # infix is infix form

    pr = {<!-- -->'|': 1, '·': 2, '*': 3, '(': 0, ')': 4} # Set priority table
    sp = [] # stack to store symbols
    result = '' # store the result
    # dispatch field algorithm
    for i in fix:
        if i not in ['|', '·', '*', '(', ')']: # non-operator output
            result += i
        elif i == '(': # '(' is pushed onto the stack
            sp. append(i)
        elif i == ')': # ')' is popped and output until the top of the stack is '(', '(' is popped but not output
            p = ''
            while p != '(':
                result += p
                p = sp. pop()
        elif len(sp) == 0 or pr[sp[-1]] < pr[i]: # If the current op priority is greater than the top priority of the stack or the stack is empty, push it to the stack
            sp. append(i)
        else:
            while len(sp) != 0 and pr[sp[-1]] >= pr[i]: # Pop out until the current op priority is lower than the top priority of the stack or the stack is empty
                result += sp. pop()
            sp. append(i)
    while len(sp) != 0: # Output the remaining stack elements in sequence
        result += sp. pop()

Reverse Polish expressions to NFA

Representation of NFA

? State (ID, start, accepted)

? Transition (source state, target state, char/ε)

Reverse Polish Expression to NFA

? bottom up

  1. NFA’s stack operation n encounters a letter, builds a basic NFA and pushes it into the stack
  2. When an operator is encountered, a corresponding number of NFAs are popped out of the stack,
  3. Build a new NFA and put it on the stack
def suffixToNfa():
    global result
    global Nfa # ?
    suffix = result. get()
    # If it does not enter the mainloop, it will be marked and not output
    if suffix == "":
        return NFA(0, 0, [0, 0, ''])

    sp = []
    for i in suffix:
        if i not in ['|', '·', '*']: # Encounter a letter, build a basic NFA and push it into the stack
            sp.append(NFA(1, 2, [[1, 2, i]]))
        # When an operator is encountered, the corresponding number of NFAs are popped out of the stack, and a new NFA is constructed according to the courseware method 1 and put into the stack
        elif i == '|':
            nfa1 = sp. pop()
            nfa2 = sp. pop()
            nfa2.__or__(nfa1)
            sp.append(nfa2)
        elif i == '·':
            nfa1 = sp. pop()
            nfa2 = sp. pop()
            nfa2.__and__(nfa1)
            sp.append(nfa2)
        else:
            nfa = sp. pop()
            nfa. repeat()
            sp.append(nfa)
    Nfa = sp. pop()
    return Nfa

NFA to DFA

Representation of DFA

? State (ID, start, accepted, transition)

? Transition (target state, char)

? Corresponding NFA state set, processing completion mark, used to store the intermediate state of the conversion algorithm

def nfaToDfa():
    global Dfa
    global Nfa
    nfa = suffixToNfa()
    state = [] # state queue
    dfastate = [] # state contained in dfa
    character = [] # build character set
    sp = [] # store transformation matrix
    flag = [] # whether to include the final state
    for i in range(len(nfa.move)):
        if nfa.move[i][2] != 'ε' and nfa.move[i][2] not in character:
            character.append(nfa.move[i][2])
    length = len(character) # length of character
    s0 = getclosure(nfa, [1], 'ε') # initial state
    state.append(s0)
    dfastate.append(s0)
    if nfa.F in s0:
        flag.append(1)
    else:
        flag.append(0)
    # Get the transformation matrix and end when the state queue is empty
    while len(state) != 0:
        I = state. pop(0)
        tem = [I] # store each row of the transformation matrix
        for i in range(length):
            char = character[i]
            temp = getclosure(nfa, I, char)
            temp = getclosure(nfa, temp, 'ε')
            tem.append(temp)
            if len(temp) != 0 and temp not in dfastate:
                state.append(temp) # Add to the state queue
                if temp not in dfastate:
                    dfastate.append(temp) # Add to dfastate
                    if nfa.F in temp:
                        flag.append(1)
                    else:
                        flag.append(0)
        sp.append(tem) # state queue head dequeue
    # Rename
    for i in range(len(sp)):
        for j in range(length + 1):
            for k in range(len(dfastate)):
                if sp[i][j] == dfastate[k]:
                    sp[i][j] = k
    Dfa = [sp, character, flag]

DFA minimization

Representation of DFA

? State (ID, start, accepted, transition)

? Transition (target state, char)

? GroupID

# split method mindfa
minDfa = []
def mindfa():
    global minDfa
    nfaToDfa()
    global Dfa
    dfa = Dfa[0]
    character = Dfa[1]
    flag = Dfa[2]
    gdfa = [] # group status
    cflag = []
    for i in range(len(flag)):
        cflag.append(flag[i])
    for i in range(len(dfa)):
        tem = []
        for j in range(len(character)):
            tem.append(dfa[i][j+1])
        tem.append(cflag[i])
        gdfa.append(tem)
    group = [] # group
    # Get the initial group
    for i in range(len(dfa)):
        for j in range(len(character)):
            if dfa[i][j + 1] != []:
                gdfa[i][j] = flag[dfa[i][j + 1]]
    k = -1
    while k != len(group):
        k = len(group)
        # grouping
        for i in range(len(gdfa)):
            if gdfa[i] not in group:
                group.append(gdfa[i])
        # Find your corresponding group number
        for i in range(len(gdfa)):
            for j in range(len(group)):
                if gdfa[i] == group[j]:
                    flag[i] = j
        # Reorganize
        for i in range(len(gdfa)):
            for j in range(len(character)):
                if dfa[i][j + 1] != []:
                    gdfa[i][j] = flag[dfa[i][j + 1]]
    # Modify the flags for duplicates
    reflag = []
    for i in range(len(gdfa)):
        reflag.append(-1)
    for i in range(len(gdfa)):
        for j in range(len(gdfa)):
            if gdfa[i] == gdfa[j] and i != j:
                if reflag[max(i, j)] != -1:
                    reflag[max(i, j)] = min(min(i, j), reflag[max(i, j)])
                else:
                    reflag[max(i, j)] = min(i, j)
    for i in range(len(gdfa)):
        for j in range(len(character) + 1):
            if dfa[i][j] != [] and reflag[dfa[i][j]] != -1:
                dfa[i][j] = reflag[dfa[i][j]]
    # merge duplicates
    group = []
    for i in range(len(gdfa)):
        if dfa[i] not in group:
            group.append(dfa[i])
    # Whether the flag mark is final
    for i in range(len(gdfa)):
        for j in range(len(group)):
            if dfa[i] == group[j]:
                flag[j] = cflag[i]
    minDfa = [group, character, flag]

Visual running instance

The complete NFA graph: