Python implements HBA hybrid bat intelligent algorithm optimization support vector machine classification model (SVC algorithm) project practice

1. Project background

The bat algorithm is a heuristic search algorithm proposed by Professor Yang in 2010 based on swarm intelligence. It is an effective method to search for the global optimal solution. This algorithm is based on iterative optimization, initializing to a set of random solutions, then iteratively searching for the optimal solution, and generating local new solutions through random flying around the optimal solution to enhance the local search speed. This algorithm has the characteristics of simple implementation and few parameters.

Aiming at the shortcomings of the basic bat algorithm, such as slow convergence speed, easy falling into local optimum, and low solution accuracy, the hybrid bat algorithm proposes a hybrid bat algorithm that incorporates local search to solve unconstrained optimization problems. The algorithm uses chaotic sequences to initialize the bat’s position and speed, laying the foundation for the diversity of global search; it integrates Powell search to enhance the local search ability of the algorithm and speed up the convergence speed; it uses mutation strategy to avoid the algorithm from falling into the local minimum to a certain extent. excellent.

This project optimizes the support vector machine classification model through the HBA hybrid bat intelligent algorithm.

2. Data acquisition

The modeling data for this time comes from the Internet (compiled by the author of this project). The statistics of the data items are as follows:

Number

Variable name

Description

1

x1

2

x2

3

x3

4

x4

5

x5

6

x 6

7

x 7

8

x 8

9

y

dependent variable

The data details are as follows (partially displayed):

3. Data preprocessing

3.1 Use Pandas tool to view data

Use the head() method of the Pandas tool to view the first five rows of data:

Key code:

3.2 Missing data check

Use the info() method of the Pandas tool to view data information:

As you can see from the picture above, there are a total of 9 variables, no missing values in the data, and a total of 1,000 pieces of data.

Key code:

3.3 Data descriptive statistics

Use the describe() method of the Pandas tool to view the mean, standard deviation, minimum value, quantile, and maximum value of the data.

The key code is as follows:

4. Exploratory data analysis

4.1 y variable histogram

Use the plot() method of the Matplotlib tool to draw a histogram:

4.2 y=1 sample x1 variable distribution histogram

Use the hist() method of the Matplotlib tool to draw a histogram:

4.3 Correlation Analysis

As can be seen from the figure above, the larger the value, the stronger the correlation. Positive values are positive correlations, and negative values are negative correlations.

5. Feature Engineering

5.1 Create feature data and label data

The key code is as follows:

5.2 Data Set Splitting

The train_test_split() method is used to divide 80% of the training set and 20% of the test set. The key code is as follows:

6. Construct HBA hybrid bat intelligent algorithm to optimize support vector machine classification model

The HBA hybrid bat intelligent algorithm is mainly used to optimize the SVC algorithm for target classification.

6.1 Algorithm Introduction

Note: _The BA algorithm introduction comes from the Internet for reference. If you need more algorithm principles, please find the information yourself_.

The Bat Algorithm (BA) algorithm is a random search algorithm that simulates the use of sonar by bats in nature to detect prey and avoid obstacles. It simulates bats’ use of ultrasonic waves to carry out the most basic detection and positioning capabilities of obstacles or prey and locate them. Linked to the optimization target function. The bionic principle of the BA algorithm maps a population of individual bats into NP feasible solutions in a D-dimensional problem space, and simulates the optimization process and search into the movement process of individual bats in the population and the search for prey, which is measured by the fitness function value of the solution problem. The advantages and disadvantages of the bat’s position can be compared to the iterative process of replacing poor feasible solutions with good feasible solutions in the optimization and search process. In the bat search algorithm, in order to simulate bats detecting prey and avoiding obstacles, the following three approximate or idealized rules need to be assumed:

1) All bats use echolocation to sense distance, and they use an ingenious way to distinguish the difference between prey and background obstacles.

2) The bat flies randomly at position xi with speed vi, and searches for prey with a fixed frequency fmin, variable wavelength λ and volume A0. The bat automatically adjusts the emitted pulse wavelength (or frequency) and adjusts the pulse emission rate r to [0,1] based on its proximity to the target.

3) Although there are many ways to change the volume, in the bat algorithm, it is assumed that the volume A changes from a maximum value A0 (integer) to a fixed minimum value Amin.

For the optimization problem where the objective function is minf(x) and the target variable is X=(x1,x2,…,xd)T, the implementation process of the BA algorithm is described as follows:

Step1: Population initialization, that is, bats diffuse and distribute a set of initial solutions in a D-dimensional space in a random manner. Maximum pulse volume A0, maximum pulse rate R0, search pulse frequency range [fmin, fmax], volume attenuation coefficient α, search frequency enhancement coefficient γ, search accuracy ε or maximum iteration number iter_max.

Step2: Randomly initialize the bat’s position xi, and find the current optimal solution x* based on the fitness value.

Step3: Bat’s search pulse frequency, speed and position update. The population changes with each formula during the evolution process:

fi=fmin + (fmax-fmin)xβ (1)

vit=vi(t-1) + (xi^t-x*)xfi (2)

xit=xi(t-1) + vi^(t) (3)

In the formula: β belongs to [0,1] and is a uniformly divided random number; fi is the search pulse frequency of bat i, and fi belongs to [fmin, fmax]; vit, vi(t-1 ) represents the speed of bat i at t and t-1 respectively; xit, xi(t-1) represents the position of bat i at t and t-1 respectively; x* represents the current Bat’s optimal solution.

Step4: Generate a uniformly distributed random number rand. If rand>r, the current optimal solution will be randomly perturbed, a new solution will be generated, and the new solution will be processed out of bounds.

Step5: Generate a uniformly distributed random number rand. If rand

Ai(t + 1)=αAi(t) (4)

ri^(t + 1)=R0[1-exp(-γt)] (5)

Step6: Sort the fitness values of all bats and find the current optimal solution and optimal value.

Step7: Repeat steps Step2~Step5 until the set optimal solution conditions are met or the maximum number of iterations is reached.

Step8: Output the global optimal value and optimal solution.

From the above formulas (3) to (5) of the bat algorithm implementation process, it can be seen that the two parameters in the bat algorithm: the attenuation coefficient α of the volume and the enhancement coefficient of the search frequency have a great impact on the performance of the algorithm. The key to how to effectively balance the optimization accuracy and convergence speed of the algorithm is to reasonably set the values of parameters α and γ. During the simulation process, appropriate parameter α and γ values can be obtained by repeatedly adjusting the values of parameters α and γ.

6.2 Optimal parameters found by HBA hybrid bat algorithm optimization algorithm

Key code:

Process data for each iteration:

As can be seen from the above figure, the position data of the bat in each iteration.

Optimal parameters:

6.3 Model construction with optimal parameter values

Number

Model name

Parameters

1

Support vector machine classification model

C=2.0300406757603744

2

gamma=1.4542106511419177

7. Model evaluation

7.1 Evaluation indicators and results

Evaluation indicators mainly include accuracy, precision, recall, F1 score, etc.

As can be seen from the table above, the F1 score is 0.9154, indicating that the model is more effective.

The key code is as follows:

7.2 Check whether it is overfitting

As can be seen from the figure above, the scores of the training set and the test set are equivalent, and there is no overfitting phenomenon.

7.3 Classification Report

As can be seen from the above figure, the F1 score for classification 0 is 0.91; the F1 score for classification 1 is 0.92.

7.4 Confusion Matrix

As can be seen from the above figure, there are 13 samples that are actually 0 and are not predicted to be 0; there are 4 samples that are actually 1 and are not predicted to be 1. The overall prediction accuracy is good.

8. Conclusion and outlook

To sum up, this paper uses the HBA hybrid bat intelligent optimization algorithm to find the optimal parameter values of the support vector machine SVC algorithm to build a classification model, which ultimately proves that the model we proposed works well. This model can be used for predictions of everyday products.

Digression

In this era of big data, how can you keep up with scripting without mastering a programming language? Python, the hottest programming language at the moment, has a bright future! If you also want to keep up with the times and improve yourself, please take a look.

Interested friends will receive a complete set of Python learning materials, including interview questions, resume information, etc. See below for details.

1. Python learning routes in all directions

The technical points in all directions of Python have been compiled to form a summary of knowledge points in various fields. Its usefulness is that you can find corresponding learning resources according to the following knowledge points to ensure that you learn more comprehensively.

img
img

2. Essential development tools for Python

The tools have been organized for you, and you can get started directly after installation! img

3. Latest Python study notes

When I learn a certain basic and have my own understanding ability, I will read some books or handwritten notes compiled by my seniors. These notes record their understanding of some technical points in detail. These understandings are relatively unique and can be learned. to a different way of thinking.

img

4. Python video collection

Watch a comprehensive zero-based learning video. Watching videos is the fastest and most effective way to learn. It is easy to get started by following the teacher’s ideas in the video, from basic to in-depth.

img

5. Practical cases

What you learn on paper is ultimately shallow. You must learn to type along with the video and practice it in order to apply what you have learned into practice. At this time, you can learn from some practical cases.

img

6. Interview Guide

Resume template

If there is any infringement, please contact us for deletion