Data classification of BP neural network-speech feature signal classification

Hello everyone, I am taking me to ski!

BP neural network, also known as backpropagation neural network, is a type of artificial neural network (ANN) commonly used for classification and regression tasks. It is a feedforward neural network that usually consists of an input layer, one or more hidden layers, and an output layer. The classification task of BP neural network involves classifying input data into different categories, where each category is represented by a node output by the network.

Table of Contents

(1) Training steps of BP neural network

(2) Speech feature recognition classification

(3) Model establishment

(4) Data selection and normalization

(5) BP neural network structure initialization

(6) Model training

(7) Model classification

(8) Result analysis

(1) Training steps of BP neural network

The training process of BP neural network includes the following steps:

Input layer: The input layer receives raw data and passes it to the neural network. Each input node corresponds to a feature or attribute of the data.
Hidden layer: BP neural network can contain one or more hidden layers. The purpose of hidden layers is to learn complex patterns and features in the data. Each hidden layer contains multiple neurons, which are connected through weights and activation functions.
Output layer: The output layer produces the final output of the network, which usually corresponds to the different categories of the classification. Each output node represents a category, and the output value is usually interpreted as the probability that a certain sample belongs to that category.
Weight: In BP neural network, each connection has an associated weight. These weights are parameters of the network, learned through training. They are used to control the transmission and transformation of signals in the network.
Activation function: Each neuron contains an activation function that converts the neuron’s input into an output. Common activation functions include Sigmoid, ReLU (Rectified Linear Unit) and Softmax functions.
Forward propagation: Forward propagation refers to the process of information transfer from the input layer to the output layer. Each neuron multiplies its input with the associated weight and passes the result to the activation function. This process proceeds layer by layer until the output is obtained.
Back propagation: Back propagation is a key part of BP neural network. It uses a loss function to measure the error between the network output and the actual target. Then, through the chain rule, the error is back-propagated back to the network to adjust the weights and reduce the error. This is achieved through the gradient descent algorithm to minimize the loss function.
Training: Training involves adjusting the weights of a network by providing it with a large amount of known input and target output data so that the network can classify new data. Training typically involves multiple iterations of the forward and backpropagation processes.
Prediction: Once a network is trained, it can be used to classify unknown data. Input data is passed into the network, which then outputs a probability or class label representing each class.

(2) Speech feature recognition classification

Speech feature signal recognition is a technology that involves analyzing and identifying speech features extracted from sound signals. These features are quantifiable attributes in a sound signal that help understand and identify a speaker’s identity, language, emotion, speech rate, pitch, and other relevant information. Speech feature signal recognition has wide applications in speech processing, speech recognition, emotion analysis, speaker recognition and other fields.

The operation process of speech recognition is: first, the speech to be recognized is converted into an electrical signal and then input into the recognition system. After preprocessing, the speech feature signal is extracted using mathematical methods. The extracted speech feature signal can be regarded as the pattern of the speech segment; then , compare the speech model with the known reference pattern, and obtain the best matching reference pattern as the recognition result of the speech segment.

Select four different types of music, including folk songs, guzheng, rock and pop, and use BP neural network to effectively classify these four types of music. Each piece of music uses the cepstral coefficient method (the core idea of the cepstral coefficient method is to convert the spectral information of the signal into the cepstral domain in order to better analyze and process the characteristics of the signal) to extract 500 groups of 24-dimensional speech feature signals, and propose speech characteristic signal.

(3) Model establishment

Since the speech feature input signal has 24 dimensions and there are 4 categories of speech signals to be classified, the structure of the BP neural network is set to 24-25-4, that is, the input layer has 24 nodes, the hidden layer has 25 nodes, and the output The layer has 4 nodes. BP neural network training uses training data to train BP neural network. Since there are 2000 groups of speech feature signals in total, 1500 groups are randomly selected as training data to train the neural network, and 500 groups of data are used as test data to test the network classification ability. The BP neural network then uses the trained neural network to classify the speech category to which the test data belongs.

(4) Data selection and normalization

First, four types of music feature signals are extracted according to the cepstrum coefficient method. Different speech signals are marked with 1, 2, 3, and 4 respectively. The extracted signals are stored in the data1.mat, data2.mat, data3.mat, and data4.mat databases respectively. In the file, each set of data has 25 dimensions, the first dimension is the category identifier, and the last 24 dimensions are the speech feature signals. Normalize the aggregated data. Set the expected output value of each group of speech signals according to the speech category identifier. For example, if the identifier class is 1, the expected output vector is [1,0,0,0].

%% Clear environment variables
clc
clear

%% Training data prediction data extraction and normalization

%Download four types of voice signals
load data1 c1
load data2 c2
load data3 c3
load data4 c4

%Four characteristic signal matrices are combined into one matrix
data(1:500,:)=c1(1:500,:);
data(501:1000,:)=c2(1:500,:);
data(1001:1500,:)=c3(1:500,:);
data(1501:2000,:)=c4(1:500,:);

%Randomly sort from 1 to 2000
k=rand(1,2000);
[m,n]=sort(k);

%Input and output data
input=data(:,2:25);
output1 =data(:,1);

%Convert the output from 1 dimension to 4 dimensions
output=zeros(2000,4);
for i=1:2000
    switch output1(i)
        case 1
            output(i,:)=[1 0 0 0];
        case 2
            output(i,:)=[0 1 0 0];
        case 3
            output(i,:)=[0 0 1 0];
        case 4
            output(i,:)=[0 0 0 1];
    end
end

% Randomly extract 1500 samples as training samples and 500 samples as prediction samples
input_train=input(n(1:1500),:)';
output_train=output(n(1:1500),:)';
input_test=input(n(1501:2000),:)';
output_test=output(n(1501:2000),:)';

%Input data normalization
[inputn,inputps]=mapminmax(input_train);

(5) BP neural network structure initialization

According to the characteristics of the speech feature signal, the structure of the BP neural network is determined to be 24-25-4, and the weights and thresholds of the BP neural network are randomly initialized.

innum=24;
midnum=25;
outnum=4;
 

% weight initialization
w1=rands(midnum,innum);
b1=rands(midnum,1);
w2=rands(midnum,outnum);
b2=rands(outnum,1);

w2_1=w2;w2_2=w2_1;
w1_1=w1;w1_2=w1_1;
b1_1=b1;b1_2=b1_1;
b2_1=b2;b2_2=b2_1;

% learning rate
xite=0.1;
alfa=0.01;
loopNumber=10;
I=zeros(1,midnum);
Iout=zeros(1,midnum);
FI=zeros(1,midnum);
dw1=zeros(innum,midnum);
db1=zeros(1,midnum);

(6) Model training

The model is trained using the training data, and the network’s weights and thresholds are adjusted during the training process based on the network’s prediction error.

E=zeros(1,loopNumber);
forii=1:10
    E(ii)=0;
    for i=1:1:1500
       %% network prediction output
        x=inputn(:,i);
        % Hidden layer output
        for j=1:1:midnum
            I(j)=inputn(:,i)'*w1(j,:)' + b1(j);
            Iout(j)=1/(1 + exp(-I(j)));
        end
        % Output layer output
        yn=w2'*Iout' + b2;
        
       %% weight threshold correction
        %Calculation error
        e=output_train(:,i)-yn;
        E(ii)=E(ii) + sum(abs(e));
        
        % Calculate the weight change rate
        dw2=e*Iout;
        db2=e';
        
        for j=1:1:midnum
            S=1/(1 + exp(-I(j)));
            FI(j)=S*(1-S);
        end
        for k=1:1:innum
            for j=1:1:midnum
                dw1(k,j)=FI(j)*x(k)*(e(1)*w2(j,1) + e(2)*w2(j,2) + e(3)*w2(j ,3) + e(4)*w2(j,4));
                db1(j)=FI(j)*(e(1)*w2(j,1) + e(2)*w2(j,2) + e(3)*w2(j,3) + e(4 )*w2(j,4));
            end
        end
           
        w1=w1_1 + xite*dw1' + alfa*(w1_1-w1_2);
        b1=b1_1 + xite*db1' + alfa*(b1_1-b1_2);
        w2=w2_1 + xite*dw2' + alfa*(w2_1-w2_2);
        b2=b2_1 + xite*db2' + alfa*(b2_1-b2_2);
        
        w1_2=w1_1;w1_1=w1;
        w2_2=w2_1;w2_1=w2;
        b1_2=b1_1;b1_1=b1;
        b2_2=b2_1;b2_1=b2;
    end
end

(7) Model classification

Use the trained BP neural network model to classify speech feature signals, and analyze the classification ability of the BP neural network based on the classification results.

output_fore=zeros(1,500);
for i=1:500
    output_fore(i)=find(fore(:,i)==max(fore(:,i)));
end

%BP network prediction error
error=output_fore-output1(n(1501:2000))';

%Draw a classification diagram of predicted speech types and actual speech types
figure(1)
plot(output_fore,'r')
hold on
plot(output1(n(1501:2000))','b')
legend('predicted speech category','actual speech category')

%Draw an error graph
figure(2)
plot(error)
title('BP network classification error','fontsize',12)
xlabel('voice signal','fontsize',12)
ylabel('Classification error','fontsize',12)

%print -dtiff -r600 1-4

k=zeros(1,4);
%Find out which category the wrongly judged classification belongs to
for i=1:500
    if error(i)~=0
        [b,c]=max(output_test(:,i));
        switch c
            case 1
                k(1)=k(1) + 1;
            case 2
                k(2)=k(2) + 1;
            case 3
                k(3)=k(3) + 1;
            case 4
                k(4)=k(4) + 1;
        end
    end
end

%Find the individual sum of each category
kk=zeros(1,4);
for i=1:500
    [b,c]=max(output_test(:,i));
    switch c
        case 1
            kk(1)=kk(1) + 1;
        case 2
            kk(2)=kk(2) + 1;
        case 3
            kk(3)=kk(3) + 1;
        case 4
            kk(4)=kk(4) + 1;
    end
end

%Correct rate
rightridio=(kk-k)./kk;

(8) Result analysis

The classification error of BP neural network is shown in the figure below.

The classification accuracy rate of BP neural network is:

Speech signal recognition	First category	Second category	The third category	The fourth category
Correct rate	0.8049	1	0.8702	0.8984

It can be found from the accuracy of the classification results that the speech signal classification algorithm based on BP neural network has high accuracy and can accurately identify the category of the speech signal.

More high-quality content is being released continuously, please go to the homepage to view.

If you have any questions, please contact us via email: [email protected]

Blogger’s WeChat:TCB1736732074

Like + follow to avoid getting lost next time!