[Algorithm] The second generation genetic algorithm NSGA-II optimizes the SVR hyperparameter model

The second generation genetic algorithm NSGA-II optimizes the SVR hyperparameter model

  • 1. Introduction to NSGA-II
  • 2. Modeling purpose
  • 3. NSGA-II optimizes SVR hyperparameter model
    • 3.1 Hyperparameter settings
    • 3.2 Import data set
    • 3.3 Model construction
      • 3.3.1 Classes that define independent variables
      • 3.3.2 Initializing the population
      • 3.3.3 Evolution
      • 3.3.4 Output the optimal solution set
  • 4. Model testing

1. Introduction to NSGA-II

NSGA-II (Non-dominated Sorting Genetic Algorithm II) is amulti-objective optimization algorithm used to solve optimization problems with multiple conflicting objectives. It gradually improves solutions in the population by simulating natural selection and genetic operations during evolution to find a set of the best possible solutions that are non-dominated under multiple objectives.

2. Modeling purpose

Use NSGA-II to optimize SVR hyperparameters, find the optimal hyperparameter C of SVR, and output the corresponding evaluation index MSE. The hyperparameter range is set as follows:

  • Hyperparameter C range (0.01, 10)
  • Number of iterations 5
  • Population size 5

Hyperparameter range, number of iterations, and population size can be customized

3. NSGA-II optimizes SVR hyperparameter model

3.1 Hyperparameter settings

First, set the hyperparameters in the form of global variables. The code is as follows:

# Set parameters
pop_size = 5 # Population size
gen_size = 5 # Evolutionary algebra
pc = 1 # Crossover probability
pm = 0.3 # Mutation probability
num_obj = 1 #Number of objective functions
x_range = (0.01, 10) #The value range of the independent variable

3.2 Import data set

Secondly, use read_excel to read excel to import the data set, and divide the training set and test set. The code is as follows:

data = pd.read_excel('C:/Users/SunHaitao/Desktop/x.xlsx', sheet_name='Sheet1') # Read data
target = pd.read_excel('C:/Users/Sun Haitao/Desktop/y.xlsx', sheet_name='Sheet1') # Read data
x_train, x_test, y_train, y_test = train_test_split(data, target, random_state=22, test_size=0.25)

3.3 Model Construction

Implement the writing and packaging of the second generation genetic algorithm NSGA-II optimized SVR hyperparameter model.

3.3.1 Classes that define independent variables

# Define the class of independent variables
class Individual:
    def __init__(self, x):
        self.x = x
        self.objs = [None] * num_obj
        self.rank = None
        self.distance = 0.0

    # Calculate the value of the objective function
    def evaluate(self):
        c = self.x
        model_svr = SVR(C=c)
        model_svr.fit(x_train, y_train)
        predict_results = model_svr.predict(x_test)
        #rmse
        self.objs[0] =np.sqrt(mean_squared_error(y_test, predict_results))

3.3.2 Initializing the population

# Initialize population
pop = [Individual(random.uniform(*x_range)) for _ in range(pop_size)]

3.3.3 Evolution

Evolution includes calculation of objective function value, non-dominated sorting, calculation of crowding distance, crossover, mutation and other operations. The integrated code is as follows:

# Evolution
for _ in range(gen_size):
    print(f"{<!-- -->_}th iteration")
    # Calculate the value of the objective function
    for ind in pop:
        ind.evaluate()
 
    # Non-dominated sorting
    fronts = [set()]
    for ind in pop:
        ind.domination_count = 0
        ind.dominated_set = set()
 
        for other in pop:
            if ind.objs[0] < other.objs[0] :
                ind.dominated_set.add(other)
            elif ind.objs[0] > other.objs[0] :
                ind.domination_count + = 1
 
        if ind.domination_count == 0:
            ind.rank = 1
            fronts[0].add(ind)
 
    rank=1
    while fronts[-1]:
        next_front = set()
 
        for ind in fronts[-1]:
            ind.rank = rank
 
            for dominated_ind in ind.dominated_set:
                dominated_ind.domination_count -= 1
 
                if dominated_ind.domination_count == 0:
                    next_front.add(dominated_ind)
 
        fronts.append(next_front)
        rank + = 1
 
    # Calculate crowding distance
    pop_for_cross=set()
    for front in fronts:
        if len(front) == 0:
            continue
 
        sorted_front = sorted(list(front), key=lambda ind: ind.rank)
        for i in range(num_obj):
            sorted_front[0].objs[i] = float('inf')
            sorted_front[-1].objs[i] = float('inf')
            for j in range(1, len(sorted_front) - 1):
                delta = sorted_front[j + 1].objs[i] - sorted_front[j - 1].objs[i]
                if delta == 0:
                    continue
 
                sorted_front[j].distance + = delta / (x_range[1] - x_range[0])
 
        front_list = list(sorted_front)
        front_list.sort(key=lambda ind: (-ind.rank, -ind.distance))
        selected_inds =front_list
        if len(pop_for_cross) + len(selected_inds)<=pop_size:
            pop_for_cross.update(selected_inds)
        elif len(pop_for_cross) + len(selected_inds)>=pop_size and len(pop_for_cross)<pop_size:
            part_selected_inds=selected_inds[:(pop_size-len(pop_for_cross))]
            pop_for_cross.update(part_selected_inds)
            break
    #cross
    new_pop=set()
    while len(new_pop) < len(pop_for_cross):
        x1, x2 = random.sample(pop_for_cross, 2)
        if random.random() < pc:
            new_x = (x1.x + x2.x) / 2
            delta_x = abs(x1.x - x2.x)
            new_x + = delta_x * random.uniform(-1, 1)
            new_x = max(x_range[0], min(x_range[1], new_x))
            new_pop.add(Individual(new_x))
 
    # Mutations
    for ind in new_pop:
        if random.random() < pm:
            delta_x = random.uniform(-1, 1) * (x_range[1] - x_range[0])
            ind.x + = delta_x
            ind.x = max(x_range[0], min(x_range[1], ind.x))
 
    # Update the population and retain the original elite (pop_for_cross)
    pop = list(new_pop) + list(pop_for_cross)

3.3.4 Output the optimal solution set

# Output the optimal solution set
for ind in pop:
    ind.evaluate()
 
pareto_front = set()
for ind in pop:
    dominated=False
    for other in pop:
        if other.objs[0] < ind.objs[0] :
            dominated=True
            break
    if not dominated:
        pareto_front.add(ind)
 
print("Pareto front:")
for ind in pareto_front:
    print(f"x={<!-- -->ind.x:.4f}, y1={<!-- -->ind.objs[0]:.4f}")

4. Model testing

The optimal hyperparameter C output by the final model is 7.6418, and the corresponding evaluation index MSE is 87.2814.