Regular expression to minimize DFA (python implementation)
Achieve your goals
- Analyze the regular expressions (including basic operations) input by the user to understand the operation
- Convert to NFA
- Convert to DFA
- Minimize DFA
- Use minimal DFA to analyze whether the input string matches
- Interaction and Visualization
See the end for the effect.
Building a regular expression process
- building the alphabet
? Adaptive with input
- Add and operator symbols
# Add AND operator def inputR(): var. set(ei. get()) s = list(var. get(). replace(' ', '')) i = 0 while i < len(s) - 1: if s[i] not in ['|', '·', '*', '(', ')'] and s[i + 1] not in ['|', '·', '*', ')']: s.insert(i + 1, '·') i = 0 elif s[i] in ['*', ')'] and s[i + 1] not in ['|', '·', '*', ')']: s.insert(i + 1, '·') i = 0 i + = 1 var. set(''. join(s))
- Build reverse Polish expressions (postfix expressions)
? Shunting yard algorithm:
-
Create an operator stack for the storage of operators. This operator follows the principle that the higher the priority toward the top of the stack, the higher the priority.
-
Scan the expression sequentially, if the current character is a letter (a symbol with a priority of 0), output it directly; if the current character is an operator or bracket (a symbol with a priority other than 0), then judge:
- ? If the current operator is ‘(‘, push it directly into the stack;
- ? If it is ‘)’, the stack is popped and the operators are output sequentially until the first ‘(‘ is encountered, and the first ‘(‘ encountered is popped but not output;
- ? If it is other, compare the priority of the top element of the operator stack with the current element:
? a. If the top element of the stack is ‘(‘, the current element is directly pushed onto the stack;
? b. If the priority of the top element on the stack >= the priority of the current element, pop the stack and output the operators sequentially until the priority of the top element ? c. If the priority of the top element of the stack Repeat the above operation until the expression is scanned Sequentially pop and output operators until the stack element is empty Representation of NFA ? State (ID, start, accepted) ? Transition (source state, target state, char/ε)
# From infix expressions to reverse Polish expressions
def infixToSuffix(infix): # infix is infix form
pr = {<!-- -->'|': 1, '·': 2, '*': 3, '(': 0, ')': 4} # Set priority table
sp = [] # stack to store symbols
result = '' # store the result
# dispatch field algorithm
for i in fix:
if i not in ['|', '·', '*', '(', ')']: # non-operator output
result += i
elif i == '(': # '(' is pushed onto the stack
sp. append(i)
elif i == ')': # ')' is popped and output until the top of the stack is '(', '(' is popped but not output
p = ''
while p != '(':
result += p
p = sp. pop()
elif len(sp) == 0 or pr[sp[-1]] < pr[i]: # If the current op priority is greater than the top priority of the stack or the stack is empty, push it to the stack
sp. append(i)
else:
while len(sp) != 0 and pr[sp[-1]] >= pr[i]: # Pop out until the current op priority is lower than the top priority of the stack or the stack is empty
result += sp. pop()
sp. append(i)
while len(sp) != 0: # Output the remaining stack elements in sequence
result += sp. pop()
Reverse Polish expressions to NFA
Reverse Polish Expression to NFA
? bottom up
- NFA’s stack operation n encounters a letter, builds a basic NFA and pushes it into the stack
- When an operator is encountered, a corresponding number of NFAs are popped out of the stack,
- Build a new NFA and put it on the stack
def suffixToNfa(): global result global Nfa # ? suffix = result. get() # If it does not enter the mainloop, it will be marked and not output if suffix == "": return NFA(0, 0, [0, 0, '']) sp = [] for i in suffix: if i not in ['|', '·', '*']: # Encounter a letter, build a basic NFA and push it into the stack sp.append(NFA(1, 2, [[1, 2, i]])) # When an operator is encountered, the corresponding number of NFAs are popped out of the stack, and a new NFA is constructed according to the courseware method 1 and put into the stack elif i == '|': nfa1 = sp. pop() nfa2 = sp. pop() nfa2.__or__(nfa1) sp.append(nfa2) elif i == '·': nfa1 = sp. pop() nfa2 = sp. pop() nfa2.__and__(nfa1) sp.append(nfa2) else: nfa = sp. pop() nfa. repeat() sp.append(nfa) Nfa = sp. pop() return Nfa
NFA to DFA
Representation of DFA
? State (ID, start, accepted, transition)
? Transition (target state, char)
? Corresponding NFA state set, processing completion mark, used to store the intermediate state of the conversion algorithm
def nfaToDfa(): global Dfa global Nfa nfa = suffixToNfa() state = [] # state queue dfastate = [] # state contained in dfa character = [] # build character set sp = [] # store transformation matrix flag = [] # whether to include the final state for i in range(len(nfa.move)): if nfa.move[i][2] != 'ε' and nfa.move[i][2] not in character: character.append(nfa.move[i][2]) length = len(character) # length of character s0 = getclosure(nfa, [1], 'ε') # initial state state.append(s0) dfastate.append(s0) if nfa.F in s0: flag.append(1) else: flag.append(0) # Get the transformation matrix and end when the state queue is empty while len(state) != 0: I = state. pop(0) tem = [I] # store each row of the transformation matrix for i in range(length): char = character[i] temp = getclosure(nfa, I, char) temp = getclosure(nfa, temp, 'ε') tem.append(temp) if len(temp) != 0 and temp not in dfastate: state.append(temp) # Add to the state queue if temp not in dfastate: dfastate.append(temp) # Add to dfastate if nfa.F in temp: flag.append(1) else: flag.append(0) sp.append(tem) # state queue head dequeue # Rename for i in range(len(sp)): for j in range(length + 1): for k in range(len(dfastate)): if sp[i][j] == dfastate[k]: sp[i][j] = k Dfa = [sp, character, flag]
DFA minimization
Representation of DFA
? State (ID, start, accepted, transition)
? Transition (target state, char)
? GroupID
# split method mindfa minDfa = [] def mindfa(): global minDfa nfaToDfa() global Dfa dfa = Dfa[0] character = Dfa[1] flag = Dfa[2] gdfa = [] # group status cflag = [] for i in range(len(flag)): cflag.append(flag[i]) for i in range(len(dfa)): tem = [] for j in range(len(character)): tem.append(dfa[i][j+1]) tem.append(cflag[i]) gdfa.append(tem) group = [] # group # Get the initial group for i in range(len(dfa)): for j in range(len(character)): if dfa[i][j + 1] != []: gdfa[i][j] = flag[dfa[i][j + 1]] k = -1 while k != len(group): k = len(group) # grouping for i in range(len(gdfa)): if gdfa[i] not in group: group.append(gdfa[i]) # Find your corresponding group number for i in range(len(gdfa)): for j in range(len(group)): if gdfa[i] == group[j]: flag[i] = j # Reorganize for i in range(len(gdfa)): for j in range(len(character)): if dfa[i][j + 1] != []: gdfa[i][j] = flag[dfa[i][j + 1]] # Modify the flags for duplicates reflag = [] for i in range(len(gdfa)): reflag.append(-1) for i in range(len(gdfa)): for j in range(len(gdfa)): if gdfa[i] == gdfa[j] and i != j: if reflag[max(i, j)] != -1: reflag[max(i, j)] = min(min(i, j), reflag[max(i, j)]) else: reflag[max(i, j)] = min(i, j) for i in range(len(gdfa)): for j in range(len(character) + 1): if dfa[i][j] != [] and reflag[dfa[i][j]] != -1: dfa[i][j] = reflag[dfa[i][j]] # merge duplicates group = [] for i in range(len(gdfa)): if dfa[i] not in group: group.append(dfa[i]) # Whether the flag mark is final for i in range(len(gdfa)): for j in range(len(group)): if dfa[i] == group[j]: flag[j] = cflag[i] minDfa = [group, character, flag]
Visual running instance
The complete NFA graph: