The second generation genetic algorithm NSGA-II optimizes the SVR hyperparameter model
- 1. Introduction to NSGA-II
- 2. Modeling purpose
- 3. NSGA-II optimizes SVR hyperparameter model
-
- 3.1 Hyperparameter settings
- 3.2 Import data set
- 3.3 Model construction
-
- 3.3.1 Classes that define independent variables
- 3.3.2 Initializing the population
- 3.3.3 Evolution
- 3.3.4 Output the optimal solution set
- 4. Model testing
1. Introduction to NSGA-II
NSGA-II (Non-dominated Sorting Genetic Algorithm II) is amulti-objective optimization algorithm used to solve optimization problems with multiple conflicting objectives. It gradually improves solutions in the population by simulating natural selection and genetic operations during evolution to find a set of the best possible solutions that are non-dominated under multiple objectives.
2. Modeling purpose
Use NSGA-II to optimize SVR hyperparameters, find the optimal hyperparameter C of SVR, and output the corresponding evaluation index MSE. The hyperparameter range is set as follows:
- Hyperparameter C range (0.01, 10)
- Number of iterations 5
- Population size 5
Hyperparameter range, number of iterations, and population size can be customized
3. NSGA-II optimizes SVR hyperparameter model
3.1 Hyperparameter settings
First, set the hyperparameters in the form of global variables. The code is as follows:
# Set parameters pop_size = 5 # Population size gen_size = 5 # Evolutionary algebra pc = 1 # Crossover probability pm = 0.3 # Mutation probability num_obj = 1 #Number of objective functions x_range = (0.01, 10) #The value range of the independent variable
3.2 Import data set
Secondly, use read_excel
to read excel to import the data set, and divide the training set and test set. The code is as follows:
data = pd.read_excel('C:/Users/SunHaitao/Desktop/x.xlsx', sheet_name='Sheet1') # Read data target = pd.read_excel('C:/Users/Sun Haitao/Desktop/y.xlsx', sheet_name='Sheet1') # Read data x_train, x_test, y_train, y_test = train_test_split(data, target, random_state=22, test_size=0.25)
3.3 Model Construction
Implement the writing and packaging of the second generation genetic algorithm NSGA-II optimized SVR hyperparameter model.
3.3.1 Classes that define independent variables
# Define the class of independent variables class Individual: def __init__(self, x): self.x = x self.objs = [None] * num_obj self.rank = None self.distance = 0.0 # Calculate the value of the objective function def evaluate(self): c = self.x model_svr = SVR(C=c) model_svr.fit(x_train, y_train) predict_results = model_svr.predict(x_test) #rmse self.objs[0] =np.sqrt(mean_squared_error(y_test, predict_results))
3.3.2 Initializing the population
# Initialize population pop = [Individual(random.uniform(*x_range)) for _ in range(pop_size)]
3.3.3 Evolution
Evolution includes calculation of objective function value, non-dominated sorting, calculation of crowding distance, crossover, mutation and other operations. The integrated code is as follows:
# Evolution for _ in range(gen_size): print(f"{<!-- -->_}th iteration") # Calculate the value of the objective function for ind in pop: ind.evaluate() # Non-dominated sorting fronts = [set()] for ind in pop: ind.domination_count = 0 ind.dominated_set = set() for other in pop: if ind.objs[0] < other.objs[0] : ind.dominated_set.add(other) elif ind.objs[0] > other.objs[0] : ind.domination_count + = 1 if ind.domination_count == 0: ind.rank = 1 fronts[0].add(ind) rank=1 while fronts[-1]: next_front = set() for ind in fronts[-1]: ind.rank = rank for dominated_ind in ind.dominated_set: dominated_ind.domination_count -= 1 if dominated_ind.domination_count == 0: next_front.add(dominated_ind) fronts.append(next_front) rank + = 1 # Calculate crowding distance pop_for_cross=set() for front in fronts: if len(front) == 0: continue sorted_front = sorted(list(front), key=lambda ind: ind.rank) for i in range(num_obj): sorted_front[0].objs[i] = float('inf') sorted_front[-1].objs[i] = float('inf') for j in range(1, len(sorted_front) - 1): delta = sorted_front[j + 1].objs[i] - sorted_front[j - 1].objs[i] if delta == 0: continue sorted_front[j].distance + = delta / (x_range[1] - x_range[0]) front_list = list(sorted_front) front_list.sort(key=lambda ind: (-ind.rank, -ind.distance)) selected_inds =front_list if len(pop_for_cross) + len(selected_inds)<=pop_size: pop_for_cross.update(selected_inds) elif len(pop_for_cross) + len(selected_inds)>=pop_size and len(pop_for_cross)<pop_size: part_selected_inds=selected_inds[:(pop_size-len(pop_for_cross))] pop_for_cross.update(part_selected_inds) break #cross new_pop=set() while len(new_pop) < len(pop_for_cross): x1, x2 = random.sample(pop_for_cross, 2) if random.random() < pc: new_x = (x1.x + x2.x) / 2 delta_x = abs(x1.x - x2.x) new_x + = delta_x * random.uniform(-1, 1) new_x = max(x_range[0], min(x_range[1], new_x)) new_pop.add(Individual(new_x)) # Mutations for ind in new_pop: if random.random() < pm: delta_x = random.uniform(-1, 1) * (x_range[1] - x_range[0]) ind.x + = delta_x ind.x = max(x_range[0], min(x_range[1], ind.x)) # Update the population and retain the original elite (pop_for_cross) pop = list(new_pop) + list(pop_for_cross)
3.3.4 Output the optimal solution set
# Output the optimal solution set for ind in pop: ind.evaluate() pareto_front = set() for ind in pop: dominated=False for other in pop: if other.objs[0] < ind.objs[0] : dominated=True break if not dominated: pareto_front.add(ind) print("Pareto front:") for ind in pareto_front: print(f"x={<!-- -->ind.x:.4f}, y1={<!-- -->ind.objs[0]:.4f}")
4. Model testing
The optimal hyperparameter C output by the final model is 7.6418, and the corresponding evaluation index MSE is 87.2814.