Combining AI and software testing-using LLM to generate TestCase from natural language

In my spare time, I imagined an idea that could improve the efficiency of the left side of the testing process. Combined with artificial intelligence, natural language can be automatically converted into a series of general functional use cases, interface use cases, code unit test cases, etc. When I encountered these 2 With the explosion of LLM models in 2016, I came up with the idea of developing a dedicated model for use case generation.

Primary needs analysis

User needs:
- Users can describe test requirements and conditions in natural language, such as verifying a functional module, checking specific input and output, etc.
- Users expect that the system can automatically generate specific test cases based on input descriptions to simplify the workload of writing test cases.
- Users need the test cases generated by the system to be executable, coverage and effective to ensure software quality and functional integrity.
Functional Requirements:
- Natural language processing: The system needs to have natural language processing capabilities, be able to understand the test requirements and conditions input by the user, and extract key information.
- Generate test cases: The system can automatically generate test cases that meet the requirements based on the description entered by the user and combined with the pre-trained LLM large model.
- Test case conversion: The system needs to convert the generated test cases into executable code snippets or data-driven test scripts to facilitate integration into the existing test process.
- Quality assessment and screening: The system should conduct quality assessment on the generated test cases to ensure the executability, coverage and effectiveness of the test cases, and conduct screening and optimization.
- Integration and deployment: The system needs to provide stable test case generation services, which can be deployed in the cloud or local servers, and integrated with existing testing tools and processes.
Non-functional requirements:
- Performance: The system needs to generate test cases efficiently and minimize user waiting time.
- Scalability: The system should have good scalability and be able to handle large-scale testing needs and concurrent requests.
- User-friendliness: The system interface should be concise and clear, facilitate user input and interaction, and provide corresponding error prompts and feedback mechanisms.
- Security: The system needs to protect the privacy and security of user data and take necessary security measures to prevent data leaks and malicious attacks.
Environmental requirements:
- Data preparation: The system needs to have sufficient software test case data sets, including various scenarios and sample data, to conduct model training and generate test cases.
- Pre-trained model: The system needs to obtain and deploy a pre-trained large LLM model, and fine-tune and train it to adapt to the test case generation needs of specific fields.
- Technical support: The system needs to provide technical support and solutions based on existing natural language processing, machine learning and software testing technologies.

Project design

Data collection and preparation:
- Collect rich and diverse software test case data, including various test scenarios, input and output samples, etc.
- Clean, label and classify data to ensure data quality and integrity.
Model training:
- Use the pre-trained LLM large model, combined with the test case data collected and prepared by yourself, to further fine-tune and train the model to adapt to the test case generation needs of specific fields.
- Methods such as Generative Adversarial Networks (GAN) can be used to enhance the model’s generation capabilities and stability.
Input and output processing:
- Design a user-friendly interface that allows users to enter test requirements and conditions in natural language. For example, enter a simple description such as “Check that login functionality is working properly.”
- Convert the user’s natural language input into an intermediate expression form understandable by the model, such as a vector representation based on natural language processing (NLP) and word embedding technology.
- Convert the intermediate results generated by the model into executable test case code, such as code snippets or data-driven test scripts.
Quality control and optimization:
- Conduct quality assessment and screening of generated test cases to ensure that the generated test cases are executable, coverage and effective.
- Design appropriate evaluation indicators or use automated testing tools to automatically execute and verify results of generated test cases to improve the quality of generation.
- Continuously collect user feedback and data feedback, and iterate and optimize the model to provide more accurate and efficient test case generation results.
Deployment and integration:
- Deploy the trained model to the cloud or local server to provide stable and efficient test case generation services.
- Integrate the test case generation system with existing testing tools and processes, such as automated testing frameworks, CI/CD pipelines, etc., to improve overall testing efficiency and automation levels.

Code implementation

Step one: Use transformers with open source GPT2 and Pytorch, and write a rough logic to test the degree of completion without fine-tuning

import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

def generate_test_case(model, tokenizer, input_text):
    # encoding
    input_ids = tokenizer.encode(input_text, return_tensors="pt")
    
    # Model generate test cases
    outputs = model.generate(input_ids=input_ids, max_length=50, num_return_sequences=1)
    
    # Decode the generated test cases
    test_case = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    return test_case

# Load pre-trained GPT-2 model
model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# Enter natural language text
input_text = "Click "Forgot Password" on the login page, and then enter your email address for verification"

# Generate test cases
test_case = generate_test_case(model, tokenizer, input_text)

#Print the generated test cases
print("Generated test case:", test_case)

Step 2: Use public data sets to fine-tune GPT2, and then repeat the first step of testing until the desired effect is achieved

import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer, GPT2Config
from torch.utils.data import Dataset, DataLoader

# Custom data set class
class CustomDataset(Dataset):
    def __init__(self, texts, tokenizer, max_length):
        self.texts = texts
        self.tokenizer = tokenizer
        self.max_length = max_length
    
    def __len__(self):
        return len(self.texts)
    
    def __getitem__(self, idx):
        input_text = self.texts[idx]
        input_ids = self.tokenizer.encode(input_text, add_special_tokens=True, truncation=True, max_length=self.max_length)
        return torch.tensor(input_ids)

# Generate data in batches
def collate_fn(data):
    input_ids = [item for item in data]
    input_ids = torch.stack(input_ids, dim=0)
    return input_ids

# Define model and tokenizer
model_name = "gpt2" # You can replace other pre-trained models as needed
output_dir = "./fine_tuned_model"
config = GPT2Config.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name, config=config)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Load and prepare training data
train_data = ["Test Case 1", "Test Case 2", "Test Case 3"] # Provide training data set according to actual needs
dataset = CustomDataset(train_data, tokenizer, max_length=128) # Custom data set
dataloader = DataLoader(dataset, batch_size=8, shuffle=True, collate_fn=collate_fn) # Data loader

# Define training parameters
num_train_epochs = 3 # Number of training rounds
learning_rate = 5e-5 # Learning rate
warmup_steps = int(len(dataset) * num_train_epochs / 256 * 0.1) # Number of warmup steps (10% of training steps)

# Switch the model to training mode and move it to the appropriate device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.train()

#Define optimizer and learning rate scheduler
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)

# Start fine-tuning
for epoch in range(num_train_epochs):
    total_loss = 0
    for batch in dataloader:
        inputs = batch.to(device)
        labels = inputs.clone()
        inputs = inputs[:, :-1]
        labels = labels[:, 1:]
        
        optimizer.zero_grad()
        
        outputs = model(inputs, labels=labels)
        loss = outputs.loss
        loss.backward()
        
        optimizer.step()
        scheduler.step()
        
        total_loss + = loss.item()
    
    avg_loss = total_loss / len(dataloader)
    print("Epoch:", epoch + 1, "Avg Loss:", avg_loss)

# Save the fine-tuned model
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

The main steps of these codes are as follows:

A custom data set class CustomDataset is defined for loading and processing training data.
Create model and tokenizer objects using the GPT2LMHeadModel class and a pretrained tokenizer.
Prepare the training data, encapsulate it in a custom dataset object, and create a data loader using DataLoader.
Switch the model to training mode and move it to the graphics card (I use an A card with ROCm here).
Define the optimizer and learning rate scheduler.
To start fine-tuning, iterate through the training data and perform steps such as forward propagation, calculating loss, back propagation and parameter update.
Save the fine-tuned model and tokenizer.

After completing the fine-tuning, repeat the first step and use the fine-tuned model to generate test cases.

Step 3: Design and implement user UI operation interface

– //pending

Step 4: Integration of automated testing platform

– //pending
– //Improve the data processing process, user interface and integration methods to achieve a complete automated software test case generation system.