Matlab simulation of network data clustering algorithm based on improved K-means

Table of Contents

1. Program function description

2. Test software version and display of running results

3. Core program

4. Principle of this algorithm

5. Complete program


1. Program function description

K-means is a basic partitioning method in cluster analysis, and the error sum of squares criterion function is often used as the clustering criterion. The main advantages are that the algorithm is simple, fast and can handle large data sets efficiently. The classic K-means clustering algorithm among clustering algorithms was studied and analyzed, and its advantages and disadvantages were summarized. This paper focuses on analyzing the dependence of the K-means clustering algorithm on the initial value, and uses experiments to verify the influence of randomly selected initial values on the clustering results. According to the shortcomings of the traditional K-means algorithm, an improved K-means algorithm is proposed, which mainly solves the problem of the impact of isolated points on the cluster center and the confirmation of the K value.

2. Test software version and running result display

MATLAB2022a version running

3. Core program

........................................................ .........
for Cluster_Num = 2 : K_start
     Cluster_Num
     flags = 0;
     Step = 4000;
     disp('K value classification');
     %Randomly define the cluster center point
     Center = Data_NoGD(:,1:Cluster_Num);
     %Perform initial iteration
     [KindData,KindNum] = func_Kmeans_Cluster(Center,Data_NoGD);
     NewCenter = func_NewCenter(KindData,KindNum,row);
     %Carry out K value classification
     while (sum(sum(NewCenter ~= Center))) & amp; Step
           Center = NewCenter;
           [KindData,KindNum] = func_Kmeans_Cluster(Center,Data_NoGD);
           NewCenter = func_NewCenter(KindData,KindNum,row);
           Step = Step-1;
     end
     %Calculate distance cost
     disp('Calculate distance cost');
     %Calculate L
     disp('calculate L');
     xl = NewCenter(1,:);
     yl = NewCenter(2,:);
     for j = 1 : Cluster_Num
         L(j) = sqrt((Xavg - xl(j))^2 + (Yavg - yl(j))^2);
     end
     

     
     Lsum(Cluster_Num - 1) = sum(L)*Cluster_Num;
     disp('calculate D');
     %Calculate D
     for j = 1:Cluster_Num
         KindData_tmpx = KindData(1,:,j);
         KindData_tmpy = KindData(2,:,j);
         %Remove the 0 values in it
         KindData_tmp = [KindData_tmpx;KindData_tmpy];
         if isempty(KindData_tmp) == 1
             D(i,j) = inf;
         else
             %Start statistics
             for i = 1:length(KindData_tmp)
                 D(i,j) = sqrt((KindData_tmp(1,i) - xl(j))^2 + (KindData_tmp(2,i) - yl(j))^2);
             end
         end
         clear KindData_tmp KindData_tmpx KindData_tmpy
     end
     

     Dsum(Cluster_Num - 1) = sum(sum(D))/Cluster_Num;
     %Calculate F(K)
     disp('calculate F');
     F(Cluster_Num - 1) = Lsum(Cluster_Num - 1) + Dsum(Cluster_Num - 1);
     F
     
     
     if isfinite(F(Cluster_Num - 1)) == 0
        break;
     else

     end
     
     
     pause(1)
     clear tmp Center KindData KindNum NewCenter Step xl yl L D
     
end

[V,IND] = min(F);
Kopt = IND + 1;
fprintf('Best clustering value K = ');
fprintf('%d',Kopt);
fprintf('\
\
');

%Use the latest K value for cluster analysis
Cluster_Num = Kopt;
[row,col] = size(Data_NoGD);
Step = 1000;
%Define three cluster center points
Center = Data_NoGD(:,1:Cluster_Num);
%Perform initial iteration
[KindData,KindNum] = func_Kmeans_Cluster(Center,Data_NoGD);
NewCenter = func_NewCenter(KindData,KindNum,row);
%According to Lei Feng, it is not
while (sum(sum(NewCenter ~= Center))) & amp; Step
    Center = NewCenter;
    [KindData,KindNum] = func_Kmeans_Cluster(Center,Data_NoGD);
    NewCenter = func_NewCenter(KindData,KindNum,row);
    Step = Step-1;
end

func_fig(Data_NoGD,Cluster_Num,KindData);
12_005m

4. Principle of this algorithm

The basic idea of the K-means clustering algorithm is that the algorithm first randomly selects k points as the initial clustering center, then calculates the distance between each data object and each clustering center, and assigns the data object to the clustering center closest to it. class; calculate a new cluster center for the adjusted new class. If there is no change in the two adjacent cluster centers, it means that the adjustment of the data object is completed and the clustering criterion Jc has converged. A characteristic of the K-means clustering algorithm is that in each iteration, it is necessary to check whether the classification of each sample is correct. If it is not correct, it must be adjusted. After all data are adjusted, the cluster center is modified and the next iteration is entered. If all data objects are correctly classified in an iterative algorithm, there will be no adjustment and there will be no change in the cluster center, which indicates that Jc has converged and the algorithm ends. This paper focuses on the research and analysis of the K-means clustering algorithm, and proposes improvements based on the K-means clustering algorithm.

K-means clustering algorithm is a hard clustering algorithm. It is a typical prototype-based objective function cluster analysis algorithm. A certain distance from the point to the prototype-cluster center and as the objective function of optimization are obtained by using the method of finding the extreme value of the function. Adjustment rules for iterative operations. The K-means clustering algorithm uses Euclidean distance as the dissimilarity measure. It seeks the optimal classification corresponding to a certain initial clustering center vector to minimize the evaluation index E value.

The error sum of squares criterion function is suitable for sample distributions where various types of samples are relatively concentrated and the number of samples is not very different. When the number of samples of different types differs greatly, the error sum of squares criterion is used. It is possible to separate classes with a large number of samples in order to minimize the total sum of squared errors.

The basic flow of the entire algorithm is shown in the figure below:

5. Complete program

VVV

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. Algorithm skill tree Home page Overview 57347 people are learning the system