Hello everyone, I am taking me to ski!
BP neural network, also known as backpropagation neural network, is a type of artificial neural network (ANN) commonly used for classification and regression tasks. It is a feedforward neural network that usually consists of an input layer, one or more hidden layers, and an output layer. The classification task of BP neural network involves classifying input data into different categories, where each category is represented by a node output by the network.
Table of Contents
(1) Training steps of BP neural network
(2) Speech feature recognition classification
(3) Model establishment
(4) Data selection and normalization
(5) BP neural network structure initialization
(6) Model training
(7) Model classification
(8) Result analysis
(1) Training steps of BP neural network
The training process of BP neural network includes the following steps:
- Input layer: The input layer receives raw data and passes it to the neural network. Each input node corresponds to a feature or attribute of the data.
- Hidden layer: BP neural network can contain one or more hidden layers. The purpose of hidden layers is to learn complex patterns and features in the data. Each hidden layer contains multiple neurons, which are connected through weights and activation functions.
- Output layer: The output layer produces the final output of the network, which usually corresponds to the different categories of the classification. Each output node represents a category, and the output value is usually interpreted as the probability that a certain sample belongs to that category.
- Weight: In BP neural network, each connection has an associated weight. These weights are parameters of the network, learned through training. They are used to control the transmission and transformation of signals in the network.
- Activation function: Each neuron contains an activation function that converts the neuron’s input into an output. Common activation functions include Sigmoid, ReLU (Rectified Linear Unit) and Softmax functions.
- Forward propagation: Forward propagation refers to the process of information transfer from the input layer to the output layer. Each neuron multiplies its input with the associated weight and passes the result to the activation function. This process proceeds layer by layer until the output is obtained.
- Back propagation: Back propagation is a key part of BP neural network. It uses a loss function to measure the error between the network output and the actual target. Then, through the chain rule, the error is back-propagated back to the network to adjust the weights and reduce the error. This is achieved through the gradient descent algorithm to minimize the loss function.
- Training: Training involves adjusting the weights of a network by providing it with a large amount of known input and target output data so that the network can classify new data. Training typically involves multiple iterations of the forward and backpropagation processes.
- Prediction: Once a network is trained, it can be used to classify unknown data. Input data is passed into the network, which then outputs a probability or class label representing each class.
(2) Speech feature recognition classification
Speech feature signal recognition is a technology that involves analyzing and identifying speech features extracted from sound signals. These features are quantifiable attributes in a sound signal that help understand and identify a speaker’s identity, language, emotion, speech rate, pitch, and other relevant information. Speech feature signal recognition has wide applications in speech processing, speech recognition, emotion analysis, speaker recognition and other fields.
The operation process of speech recognition is: first, the speech to be recognized is converted into an electrical signal and then input into the recognition system. After preprocessing, the speech feature signal is extracted using mathematical methods. The extracted speech feature signal can be regarded as the pattern of the speech segment; then , compare the speech model with the known reference pattern, and obtain the best matching reference pattern as the recognition result of the speech segment.
Select four different types of music, including folk songs, guzheng, rock and pop, and use BP neural network to effectively classify these four types of music. Each piece of music uses the cepstral coefficient method (the core idea of the cepstral coefficient method is to convert the spectral information of the signal into the cepstral domain in order to better analyze and process the characteristics of the signal) to extract 500 groups of 24-dimensional speech feature signals, and propose speech characteristic signal.
(3) Model establishment
Since the speech feature input signal has 24 dimensions and there are 4 categories of speech signals to be classified, the structure of the BP neural network is set to 24-25-4, that is, the input layer has 24 nodes, the hidden layer has 25 nodes, and the output The layer has 4 nodes. BP neural network training uses training data to train BP neural network. Since there are 2000 groups of speech feature signals in total, 1500 groups are randomly selected as training data to train the neural network, and 500 groups of data are used as test data to test the network classification ability. The BP neural network then uses the trained neural network to classify the speech category to which the test data belongs.
(4) Data selection and normalization
First, four types of music feature signals are extracted according to the cepstrum coefficient method. Different speech signals are marked with 1, 2, 3, and 4 respectively. The extracted signals are stored in the data1.mat, data2.mat, data3.mat, and data4.mat databases respectively. In the file, each set of data has 25 dimensions, the first dimension is the category identifier, and the last 24 dimensions are the speech feature signals. Normalize the aggregated data. Set the expected output value of each group of speech signals according to the speech category identifier. For example, if the identifier class is 1, the expected output vector is [1,0,0,0].
%% Clear environment variables clc clear %% Training data prediction data extraction and normalization %Download four types of voice signals load data1 c1 load data2 c2 load data3 c3 load data4 c4 %Four characteristic signal matrices are combined into one matrix data(1:500,:)=c1(1:500,:); data(501:1000,:)=c2(1:500,:); data(1001:1500,:)=c3(1:500,:); data(1501:2000,:)=c4(1:500,:); %Randomly sort from 1 to 2000 k=rand(1,2000); [m,n]=sort(k); %Input and output data input=data(:,2:25); output1 =data(:,1); %Convert the output from 1 dimension to 4 dimensions output=zeros(2000,4); for i=1:2000 switch output1(i) case 1 output(i,:)=[1 0 0 0]; case 2 output(i,:)=[0 1 0 0]; case 3 output(i,:)=[0 0 1 0]; case 4 output(i,:)=[0 0 0 1]; end end % Randomly extract 1500 samples as training samples and 500 samples as prediction samples input_train=input(n(1:1500),:)'; output_train=output(n(1:1500),:)'; input_test=input(n(1501:2000),:)'; output_test=output(n(1501:2000),:)'; %Input data normalization [inputn,inputps]=mapminmax(input_train);
(5) BP neural network structure initialization
According to the characteristics of the speech feature signal, the structure of the BP neural network is determined to be 24-25-4, and the weights and thresholds of the BP neural network are randomly initialized.
innum=24; midnum=25; outnum=4; % weight initialization w1=rands(midnum,innum); b1=rands(midnum,1); w2=rands(midnum,outnum); b2=rands(outnum,1); w2_1=w2;w2_2=w2_1; w1_1=w1;w1_2=w1_1; b1_1=b1;b1_2=b1_1; b2_1=b2;b2_2=b2_1; % learning rate xite=0.1; alfa=0.01; loopNumber=10; I=zeros(1,midnum); Iout=zeros(1,midnum); FI=zeros(1,midnum); dw1=zeros(innum,midnum); db1=zeros(1,midnum);
(6) Model training
The model is trained using the training data, and the network’s weights and thresholds are adjusted during the training process based on the network’s prediction error.
E=zeros(1,loopNumber); forii=1:10 E(ii)=0; for i=1:1:1500 %% network prediction output x=inputn(:,i); % Hidden layer output for j=1:1:midnum I(j)=inputn(:,i)'*w1(j,:)' + b1(j); Iout(j)=1/(1 + exp(-I(j))); end % Output layer output yn=w2'*Iout' + b2; %% weight threshold correction %Calculation error e=output_train(:,i)-yn; E(ii)=E(ii) + sum(abs(e)); % Calculate the weight change rate dw2=e*Iout; db2=e'; for j=1:1:midnum S=1/(1 + exp(-I(j))); FI(j)=S*(1-S); end for k=1:1:innum for j=1:1:midnum dw1(k,j)=FI(j)*x(k)*(e(1)*w2(j,1) + e(2)*w2(j,2) + e(3)*w2(j ,3) + e(4)*w2(j,4)); db1(j)=FI(j)*(e(1)*w2(j,1) + e(2)*w2(j,2) + e(3)*w2(j,3) + e(4 )*w2(j,4)); end end w1=w1_1 + xite*dw1' + alfa*(w1_1-w1_2); b1=b1_1 + xite*db1' + alfa*(b1_1-b1_2); w2=w2_1 + xite*dw2' + alfa*(w2_1-w2_2); b2=b2_1 + xite*db2' + alfa*(b2_1-b2_2); w1_2=w1_1;w1_1=w1; w2_2=w2_1;w2_1=w2; b1_2=b1_1;b1_1=b1; b2_2=b2_1;b2_1=b2; end end
(7) Model classification
Use the trained BP neural network model to classify speech feature signals, and analyze the classification ability of the BP neural network based on the classification results.
output_fore=zeros(1,500); for i=1:500 output_fore(i)=find(fore(:,i)==max(fore(:,i))); end %BP network prediction error error=output_fore-output1(n(1501:2000))'; %Draw a classification diagram of predicted speech types and actual speech types figure(1) plot(output_fore,'r') hold on plot(output1(n(1501:2000))','b') legend('predicted speech category','actual speech category') %Draw an error graph figure(2) plot(error) title('BP network classification error','fontsize',12) xlabel('voice signal','fontsize',12) ylabel('Classification error','fontsize',12) %print -dtiff -r600 1-4 k=zeros(1,4); %Find out which category the wrongly judged classification belongs to for i=1:500 if error(i)~=0 [b,c]=max(output_test(:,i)); switch c case 1 k(1)=k(1) + 1; case 2 k(2)=k(2) + 1; case 3 k(3)=k(3) + 1; case 4 k(4)=k(4) + 1; end end end %Find the individual sum of each category kk=zeros(1,4); for i=1:500 [b,c]=max(output_test(:,i)); switch c case 1 kk(1)=kk(1) + 1; case 2 kk(2)=kk(2) + 1; case 3 kk(3)=kk(3) + 1; case 4 kk(4)=kk(4) + 1; end end %Correct rate rightridio=(kk-k)./kk;
(8) Result analysis
The classification error of BP neural network is shown in the figure below.
The classification accuracy rate of BP neural network is:
Speech signal recognition | First category | Second category | The third category | The fourth category |
Correct rate | 0.8049 | 1 | 0.8702 | 0.8984 |
It can be found from the accuracy of the classification results that the speech signal classification algorithm based on BP neural network has high accuracy and can accurately identify the category of the speech signal.
More high-quality content is being released continuously, please go to the homepage to view.
If you have any questions, please contact us via email: [email protected]
Blogger’s WeChat:TCB1736732074
Like + follow to avoid getting lost next time!