Source: DeepHub IMBA This article is about 4,300 words, and it is recommended to read for 12 minutes This article is an attempt to use Pytorch to build a Gaussian mixture model classifier.
We will build a Gaussian Mixture Model (GMM) from scratch. This will give you a basic understanding of the Gaussian mixture model. This article will not involve mathematics because we have introduced it in detail in previous articles.
This article will use these libraries:
import torch import numpy as np import matplotlib.pyplot as plt import matplotlib.colors as mcolors
We will create 3 different Gaussian distributions (a, B, mix) in two dimensions, where mix should be the distribution consisting of a and B.
First, the distributions of A and B…
n_samples = 1000 A_means = torch.tensor( [-0.5, -0.5]) A_stdevs = torch.tensor([0.25, 0.25]) B_means = torch.tensor( [0.5, 0.5]) B_stdevs = torch.tensor([0.25, 0.25]) A_dist = torch.distributions.Normal( A_means, A_stdevs) A_samp = A_dist.sample([n_samples]) B_dist = torch.distributions.Normal( B_means, B_stdevs) B_samp = B_dist.sample([n_samples]) plt.figure( figsize=(6,6)) for name, sample in zip( ['A', 'B'], [A_samp, B_samp]): plt.scatter( sample[:,0], sample[:, 1], alpha=0.2, label=name) plt.legend() plt.title( "Distinct Gaussian Samples") plt.show() plt.close()
To create a single mixed Gaussian distribution, we first vertically stack the means and standard deviations of a and B, generating new tensors, each with shape = [2,2].
AB_means = torch.vstack( [ A_means, B_means]) AB_stdevs = torch.vstack([ A_stdevs, B_stdevs])
The way Pytorch mixed distributions work is by using 3 additional distributions Independent, Categorical and MixtureSameFamily on top of the original Normal distribution. Essentially, it creates a mixture, with probability weights based on a given categorical distribution. Because our new mean and standard set has an extra axis, this axis is used as a separate axis from which a decision needs to be made about which mean/standard set value to draw.
AB_means = torch.vstack( [ A_means, B_means]) AB_stdevs = torch.vstack( [ A_stdevs, B_stdevs]) AB_dist = torch.distributions.Independent(torch.distributions.Normal(AB_means, AB_stdevs), 1) mix_weight = torch.distributions.Categorical(torch.tensor([1.0, 1.0])) mix_dist = torch.distributions.MixtureSameFamily( mix_weight, AB_dist)
Here [1.0,1.0] is used to indicate that the Categorical distribution should be sampled uniformly from each independent axis. To verify that it works, we will plot the values of each distribution…
A_samp = A_dist.sample( (500,)) B_samp = B_dist.sample( (500,)) mix_samp = mix_dist.sample( (500,)) plt.figure( figsize=(6,6)) for name, sample in zip( ['A', 'B', 'mix'], [A_samp, B_samp, mix_samp]): plt.scatter( sample[:,0], sample[:, 1], alpha=0.3, label=name) plt.legend() plt.title( "Original Samples with the new Mixed Distribution") plt.show() plt.close()
As you can see, the new mix_samp distribution actually overlaps with our original two separate samples from the A and B distributions.
Model
Now we can start building our classifier
First you need to create an underlying GaussianMixModel, whose means, stdev and classification weights can actually be trained through the torch backprop and autograd systems.
class GaussianMixModel(torch.nn.Module): def __init__(self, n_features, n_components=2): super().__init__() self.init_scale = np.sqrt( 6 / n_features) # What is the best scale to use? self.n_features = n_features self.n_components = n_components weights = torch.ones( n_components) means = torch.randn( n_components, n_features) * self.init_scale stdevs = torch.rand( n_components, n_features) * self.init_scale # # Our trainable Parameters self.blend_weight = torch.nn.Parameter(weights) self.means = torch.nn.Parameter(means) self.stdevs = torch.nn.Parameter(stdevs) def forward(self, x): blend_weight = torch.distributions.Categorical(torch.nn.functional.relu(self.blend_weight)) comp = torch.distributions.Independent(torch.distributions.Normal( self.means, torch.abs( self.stdevs)), 1) gmm = torch.distributions.MixtureSameFamily( blend_weight, comp) return -gmm.log_prob(x) def extra_repr(self) -> str: info = f" n_features={self.n_features}, n_components={self.n_components}, [init_scale={self.init_scale}]" return info @property def device(self): return next(self.parameters()).device
The model will return the negative log-likelihood for each sample that falls within the model’s Gaussian mixture domain.
In order to train it, we need to provide samples from a mixture of Gaussian distributions. To verify that it works, a batch of samples from a common distribution will be provided to see if it works and which samples are likely to be similar to those in our training set.
train_means = torch.randn( (4,2)) train_stdevs = (torch.rand( (4,2)) + 1.0) * 0.25 train_weights = torch.rand(4) ind_dists = torch.distributions.Independent(torch.distributions.Normal(train_means, train_stdevs), 1) mix_weight = torch.distributions.Categorical(train_weights) train_dist = torch.distributions.MixtureSameFamily( mix_weight, ind_dists) train_samp = train_dist.sample( [2000]) valid_samp = torch.rand( (4000, 2)) * 8 - 4.0 plt.figure( figsize=(6,6)) for name, sample in zip( ['train', 'valid'], [train_samp, valid_samp]): plt.scatter( sample[:,0], sample[:, 1], alpha=0.2, label=name) plt.legend() plt.title( "Training and Validation Samples") plt.show() plt.close()
The model only requires one hyperparameter n_components:
gmm = GaussianMixModel( n_features=2, n_components=4) gmm.to( 'cuda')
The training loop is also very simple:
max_iter = 20000 features = train_samp.to( 'cuda') optim = torch.optim.Adam(gmm.parameters(), lr=5e-4) metrics = {'loss':[]} for i in range( max_iter): optim.zero_grad() loss = gmm(features) loss.mean().backward() optim.step() metrics[ 'loss'].append( loss.mean().item()) print( f"{i} ) \t {metrics[ 'loss'][-1]:0.5f}", end=f"{' '*20}\r" ) if metrics[ 'loss'][-1] < 0.1: print("---- Close enough") break if len( metrics[ 'loss']) > 300 and np.std( metrics[ 'loss'][-300:]) < 0.0005: print( "---- Giving up") break print( f"Min Loss: {np.min( metrics[ 'loss']):0.5f}")
In this example, the loop stops at a loss of 1.91043 less than 7000 iterations.
If we now run the valid_samp sample through the model, we can convert the return values into relative probabilities and replot the validation data colored by predictions.
with torch.no_grad(): logits = gmm( valid_samp.to( 'cuda')) probs = torch.exp( -logits) plt.figure( figsize=(6,6)) for name, sample in zip( ['pred'], [valid_samp]): plt.scatter( sample[:,0], sample[:, 1], alpha=1.0, c=probs.cpu().numpy(), label=name) plt.legend() plt.title( "Testing Trained model on Validation") plt.show() plt.close()
Our model has learned to identify samples corresponding to regions of the training distribution. But there are improvements we can make.
Category
Through the above introduction, you should have a general understanding of how to create a Gaussian mixture model and how to train it. The next step will be to use this information to build a composite (GMMClassifier) model that can learn to identify different categories of mixed Gaussian distributions.
Here a training set of overlapping Gaussian distributions is created, with 5 different classes, where each class itself is a mixture of Gaussian distributions.
This GMMClassifier will contain 5 different GaussianMixModel instances. Each instance attempts to learn a separate class from the training data. Each prediction will be combined into a set of classification logic that GMMClassifier will use to make predictions.
First you need to make a small modification to the original GaussianMixModel and change the output from return -gmm.log_prob(x) to return gmm.log_prob(x). Because we are not directly trying to reduce this value in the training loop, it is used as the logits for our classification assignment.
The new model becomes…
class GaussianMixModel(torch.nn.Module): def __init__(self, n_features, n_components=2): super().__init__() self.init_scale = np.sqrt( 6 / n_features) # What is the best scale to use? self.n_features = n_features self.n_components = n_components weights = torch.ones( n_components) means = torch.randn( n_components, n_features) * self.init_scale stdevs = torch.rand( n_components, n_features) * self.init_scale # # Our trainable Parameters self.blend_weight = torch.nn.Parameter(weights) self.means = torch.nn.Parameter(means) self.stdevs = torch.nn.Parameter(stdevs) def forward(self, x): blend_weight = torch.distributions.Categorical(torch.nn.functional.relu(self.blend_weight)) comp = torch.distributions.Independent(torch.distributions.Normal( self.means, torch.abs( self.stdevs)), 1) gmm = torch.distributions.MixtureSameFamily( blend_weight, comp) return gmm.log_prob(x) def extra_repr(self) -> str: info = f" n_features={self.n_features}, n_components={self.n_components}, [init_scale={self.init_scale}]" return info @property def device(self): return next(self.parameters()).device
The code of our GMMClassifier is as follows:
class GMMClassifier(torch.nn.Module): def __init__(self, n_features, n_classes, n_components=2): super().__init__() self.n_classes = n_classes self.n_features = n_features self.n_components = n_components if isinstance( n_components, list) else [n_components] * self.n_classes self.class_models = torch.nn.ModuleList( [ GaussianMixModel( n_features=self.n_features, n_components=self.n_components[i]) for i in range( self.n_classes)]) def forward(self, x, ret_logits=False): logits = torch.hstack( [ m(x).unsqueeze(1) for m in self.class_models]) if ret_logits: return logits return logits.argmax( dim=1) def extra_repr(self) -> str: info = f" n_features={self.n_features}, n_components={self.n_components}, [n_classes={self.n_classes}]" return info @property def device(self): return next(self.parameters()).device
When creating a model instance, a GaussianMixModel is created for each class. Since each class may have a different number of components for its specific Gaussian mixture, we allow n_components to be a list of int values that will be used when generating each underlying model. For example: n_components=[2,4,3,5,6] will pass the correct number of components to the class model. To simplify setting all underlying models to the same value, one can also simply provide n_components=5, which will produce [5,5,5,5,5] when generating the model.
During training, access to logits is required, so the ret_logits parameter is provided in the forward() method. After training is complete, forward() can be called without arguments to return an int value for the predicted class (it only accepts argmax() of logits).
We will also create a set of 5 independent but overlapping Gaussian mixture distributions, with a random number of Gaussian components per class.
clusters = [0, 1, 2, 3, 4] features_group = {} n_samples = 2000 min_clusters = 2 max_clusters = 10 for c in clusters: features_group[ c] = [] n_clusters = torch.randint( min_clusters, max_clusters + 1, (1,1)).item() print( f"Class: {c} Clusters: {n_clusters}") for i in range( n_clusters): mu = torch.randn( (1,2)) scale = torch.rand( (1,2)) * 0.35 + 0.05 distribution = torch.distributions.Normal(mu, scale) features_group[ c] + = distribution.expand( (n_samples//n_clusters, 2)).sample() features_group[ c] = torch.vstack( features_group[ c]) features = torch.vstack( [features_group[ c] for c in clusters]).numpy() targets = torch.vstack( [torch.ones( (features_group[ c].size(0), 1)) * c for c in clusters]).view( -1).numpy() idxs = np.arange(features.shape[0]) valid_idxs = np.random.choice(idxs, 1000) train_idxs = [i for i in idxs if i not in valid_idxs] features_valid = torch.tensor(features[valid_idxs]) targets_valid = torch.tensor( targets[ valid_idxs]) features = torch.tensor(features[train_idxs]) targets = torch.tensor( targets[train_idxs]) print(features.shape) plt.figure( figsize=(8,8)) for c in clusters: plt.scatter( features_group[c][:,0].numpy(), features_group[c][:,1].numpy(), alpha=0.2, label=c) plt.title( f"{n_samples} Samples Per Class, Multiple Clusters per Class") plt.legend()
By running the above code, we can know the number of n_component used by each class. In practice it should be a hyperparameter search process, but here we already know it, so we use it directly
Class: 0 Clusters: 3 Class: 1 Clusters: 5 Class: 2 Clusters: 2 Class: 3 Clusters: 8 Class: 4 Clusters: 4
Then create the model:
gmmc = GMMClassifier( n_features=2, n_classes=5, n_components=[3, 5, 2, 8, 4]) gmmc.to( 'cuda')
The training loop also has some modifications, since this time one wants to train the model’s classification loss provided by logit prediction. Therefore, goals need to be provided during the training process of supervised learning.
features = features.to(DEVICE) targets = targets.to(DEVICE) optim = torch.optim.Adam(gmmc.parameters(), lr=3e-2) loss_fn = torch.nn.CrossEntropyLoss() metrics = {'loss':[]} for i in range(4000): optim.zero_grad() logits = gmmc(features, ret_logits=True) loss = loss_fn(logits, targets.type(torch.long)) loss.backward() optim.step() metrics[ 'loss'].append( loss.item()) print( f"{i} ) \t {metrics[ 'loss'][-1]:0.5f}", end=f"{' '*20}\r" ) if metrics[ 'loss'][-1] < 0.1: print("---- Close enough") break print( f"Mean Loss: {np.mean( metrics[ 'loss']):0.5f}")
The data is then classified from the validation data, which is generated when the training data is created, each sample is basically a different value but from the appropriate class.
preds = gmmc( features_valid.to( 'cuda'))
Looking at the preds values, you can see that they are integers for the prediction class.
print( preds[0:10]) ____ tensor([2, 4, 2, 4, 2, 3, 4, 0, 2, 2], device='cuda:1')
Finally by comparing these values with targets_valid, the accuracy of the model can be determined.
accuracy = (targets_valid == preds).sum() / targets_valid.size(0) * 100.0 print( f"Accuracy: {accuracy:0.2f}%") ____ Accuracy: 81.50%
Also view the accuracy of predictions for each category…
class_acc = {} for c in range(5): target_idxs = (targets_valid == c) class_acc[c] = (targets_valid[ target_idxs] == preds[ target_idxs]).sum() / targets_valid[ target_idxs].size(0) * 100.0 print( f"Class: {c} \t{class_acc[c]:0.2f}%") ---- Class: 0 98.54% Class: 1 69.06% Class: 2 86.12% Class: 3 70.05% Class: 4 84.09%
You can see that it does better at predicting classes with less overlap, which makes sense. And the average accuracy of 81.5% is pretty good because all these different categories overlap. I believe there is still a lot of room for improvement. If you have suggestions or can point out mistakes I’ve made, please leave a comment.
Author: Todd Shifflett
Editor: Huang Jiyan