torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True,num_workers=4, pin_memory=True)
The num_workers
parameter is a parameter of the DataLoader
class that specifies the number of child processes used by the data loader. By increasing the number of num_workers
, data can be read and preprocessed in parallel, thereby increasing the speed of data loading.
Typically, increasing the number of num_workers
can improve the efficiency of data loading because it allows data loading and preprocessing to be performed in multiple processes simultaneously. However, when the number of num_workers
exceeds a certain threshold, adding more processes may no longer bring more performance improvements, and may even cause performance degradation.
This is because increasing the number of num_workers
also increases the cost of inter-process communication. When the number of num_workers
is too large, the cost of inter-process communication may exceed the benefits of parallelization, resulting in performance degradation.
Additionally, computer hardware limitations need to be taken into consideration. If your computer has a limited number of CPU cores, increasing the number of num_workers
may also cause performance degradation because each process requires CPU core resources.
Therefore, the setting of the num_workers
parameter needs to be adjusted and optimized according to the specific situation. Typically, a reasonable num_workers
value should be between 2 and 8, depending on factors such as your computer hardware configuration and the size of your data set. In practical applications, the optimal configuration can be found by trying different num_workers
values.
In summary, when the value of num_workers
increases from 4 to 8, the performance difference between the two may be significant if factors such as your computer hardware configuration and data set size do not change. Small, or even no significant difference.
The test code is as follows
import torch import torchvision import matplotlib.pyplot as plt import torchvision.models as models import torch.nn as nn import torch.optim as optim import torch.multiprocessing as mp import time if __name__ == '__main__': mp.freeze_support() train_on_gpu = torch.cuda.is_available() if not train_on_gpu: print('CUDA is not available. Training on CPU...') else: print('CUDA is available! Training on GPU...') device = torch.device("cuda" if torch.cuda.is_available() else "cpu") batch_size = 4 #Set transformation for data preprocessing transform = torchvision.transforms.Compose([ torchvision.transforms.Resize((512,512)), #Resize the image to 224x224 torchvision.transforms.ToTensor(), # Convert to tensor torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # Normalize ]) dataset = torchvision.datasets.ImageFolder('C:\Users\ASUS\PycharmProjects\pythonProject1\cats_and_dogs_train', transform=transform) val_ratio = 0.2 val_size = int(len(dataset) * val_ratio) train_size = len(dataset) - val_size train_dataset, val_dataset = torch.utils.data.random_split(dataset, [train_size, val_size]) train_dataset = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True,num_workers=4, pin_memory=True) val_dataset = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=True,num_workers=4, pin_memory=True) model = models.resnet18() num_classes = 2 for param in model.parameters(): param.requires_grad = False model.fc = nn.Sequential( nn.Dropout(), nn.Linear(model.fc.in_features, num_classes), nn.LogSoftmax(dim=1) ) optimizer = optim.Adam(model.parameters(), lr=0.001) criterion = nn.CrossEntropyLoss().to(device) model.to(device) filename = "recognize_cats_and_dogs.pt" def save_checkpoint(epoch, model, optimizer, filename): checkpoint = { 'epoch': epoch, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'loss': loss, } torch.save(checkpoint, filename) num_epochs = 3 train_loss = [] for epoch in range(num_epochs): running_loss = 0 correct = 0 total=0 epoch_start_time = time.time() for i, (inputs, labels) in enumerate(train_dataset): # Put data on the device inputs, labels = inputs.to(device), labels.to(device) # Forward calculation outputs = model(inputs) # Calculate loss and gradient loss = criterion(outputs, labels) optimizer.zero_grad() loss.backward() # Update model parameters optimizer.step() # Record loss and accuracy running_loss + = loss.item() train_loss.append(loss.item()) _, predicted = torch.max(outputs.data, 1) correct + = (predicted == labels).sum().item() total + = labels.size(0) accuracy_train = 100 * correct / total # Calculate the accuracy on the test set with torch.no_grad(): running_loss_test = 0 correct_test = 0 total_test = 0 for inputs, labels in val_dataset: inputs, labels = inputs.to(device), labels.to(device) outputs = model(inputs) loss = criterion(outputs, labels) running_loss_test + = loss.item() _, predicted = torch.max(outputs.data, 1) correct_test + = (predicted == labels).sum().item() total_test + = labels.size(0) accuracy_test = 100 * correct_test / total_test # Output the loss and accuracy of each epoch epoch_end_time = time.time() epoch_time = epoch_end_time - epoch_start_time print("Epoch [{}/{}], Time: {:.4f}s, Loss: {:.4f}, Train Accuracy: {:.2f}%, Loss: {:.4f}, Test Accuracy : {:.2f}%" .format(epoch + 1, num_epochs,epoch_time,running_loss / len(val_dataset), accuracy_train, running_loss_test / len(val_dataset), accuracy_test)) save_checkpoint(epoch, model, optimizer, filename) plt.plot(train_loss, label='Train Loss') #Add legend and labels plt.legend() plt.xlabel('Epochs') plt.ylabel('Loss') plt.title('Training Loss') # Display graphics plt.show()
The results of different num_workers are as follows