Recommended online tools:
Three.js AI Texture Development Kit –
YOLO synthetic data generator –
GLTF/GLB online editing –
3D model format online conversion –
Programmable 3D scene editor
In this article we’ll cover how to adapt HuggingFace’s model to your task, build a custom model header in Pytorch and connect it to the body of the HF model, and train the system end-to-end.
1. HF model head and model body
This is what a typical HF model looks like:
Why do I need to use Model Head and Model Body separately?
Some HF models are trained for downstream tasks (such as questioning or text classification) and include knowledge about the data on which their weights were trained.
Sometimes, especially when our task at hand contains little data or is domain specific (e.g. medical or sports specific tasks), we can use models trained on other tasks on HUB (not necessarily the same task as ours but belong to the same domain (e.g. sports or medicine) and use some validated knowledge to improve the performance of our model on our own tasks.
- A very simple example is if say we have a small data set and say classify certain financial statements as positive or negative. However, we entered HF and found that many models have been trained on finance-related question and answer data sets, then we can use some layers of these models to improve our own tasks.
- Another simple example is that a domain-specific model is trained on a huge data set and learns to classify text into 5 categories. Suppose we have a similar classification task on a completely different dataset in the same domain and only want to classify the data into 2 categories instead of 5. At this time, we can also reuse the model body and add our own model headers to enhance the specific domain knowledge of our own tasks.
This is a diagram of what we’re going to do:
2. Customized HF model head
Our task is simple, perform sarcasm detection from this dataset on Kaggle.
You can view the complete code here. In the interest of time, I have not included the preprocessing and some training details below, so be sure to check out the entire code notebook.
I will use a model trained on a large number of tweets, with 5 classification outputs for different sentiment types. We will extract the model body, add a custom layer (2 labels, sarcastic/not sarcastic) in pytorch, and train a new model.
Note: You can use any model in this example (not necessarily one trained for classification) as we will only use the model body and remove the model head.
This is our workflow:
I’ll skip the data preprocessing step and jump directly to the main class, but you can view the entire code at the link at the beginning of this section.
3. Tokenization and dynamic filling
Use the following code to convert text into tokens and fill them dynamically:
checkpoint = "cardiffnlp/twitter-roberta-base-emotion" tokenizer = AutoTokenizer.from_pretrained(checkpoint) tokenizer.model_max_len=512 def tokenize(batch): return tokenizer(batch["headline"], truncation=True,max_length=512) tokenized_dataset = data.map(tokenize, batched=True) print(tokenized_dataset) tokenized_dataset.set_format("torch",columns=["input_ids", "attention_mask", "label"]) data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
The result is as follows:
DatasetDict({ train: Dataset({ features: ['headline', 'label', 'input_ids', 'attention_mask'], num_rows: 22802 }) test: Dataset({ features: ['headline', 'label', 'input_ids', 'attention_mask'], num_rows: 2851 }) valid: Dataset({ features: ['headline', 'label', 'input_ids', 'attention_mask'], num_rows: 2850 }) })
4. Extract the model body and add our own layers
code show as below:
class CustomModel(nn.Module): def __init__(self,checkpoint,num_labels): super(CustomModel,self).__init__() self.num_labels = num_labels #Load Model with given checkpoint and extract its body self.model = model = AutoModel.from_pretrained(checkpoint,config=AutoConfig.from_pretrained(checkpoint, output_attentions=True,output_hidden_states=True)) self.dropout = nn.Dropout(0.1) self.classifier = nn.Linear(768,num_labels) # load and initialize weights def forward(self, input_ids=None, attention_mask=None, labels=None): #Extract outputs from the body outputs = self.model(input_ids=input_ids, attention_mask=attention_mask) #Add custom layers sequence_output = self.dropout(outputs[0]) #outputs[0]=last hidden state logits = self.classifier(sequence_output[:,0,:].view(-1,768)) # calculate losses loss=None if labels is not None: loss_fct = nn.CrossEntropyLoss() loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) return TokenClassifierOutput(loss=loss, logits=logits, hidden_states=outputs.hidden_states, attentions=outputs.attentions)
As you can see, we first inherit nn.Module
in Pytorch and use AutoModel (from the transformers library) to extract the model body loaded with the specified checkpoint.
Please note that the forward()
method returns TokenClassifierOutput
, thus ensuring that the format of our output is consistent with the HF pre-trained model.
5. End-to-end training of new models
code show as below:
from tqdm.auto import tqdm progress_bar_train = tqdm(range(num_training_steps)) progress_bar_eval = tqdm(range(num_epochs * len(eval_dataloader))) for epoch in range(num_epochs): model.train() for batch in train_dataloader: batch = {k: v.to(device) for k, v in batch.items()} outputs = model(**batch) loss = outputs.loss loss.backward() optimizer.step() lr_scheduler.step() optimizer.zero_grad() progress_bar_train.update(1) model.eval() for batch in eval_dataloader: batch = {k: v.to(device) for k, v in batch.items()} with torch.no_grad(): outputs = model(**batch) logits = outputs.logits predictions = torch.argmax(logits, dim=-1) metric.add_batch(predictions=predictions, references=batch["labels"]) progress_bar_eval.update(1) print(metric.compute()) model.eval() test_dataloader = DataLoader( tokenized_dataset["test"], batch_size=32, collate_fn=data_collator ) for batch in test_dataloader: batch = {k: v.to(device) for k, v in batch.items()} with torch.no_grad(): outputs = model(**batch) logits = outputs.logits predictions = torch.argmax(logits, dim=-1) metric.add_batch(predictions=predictions, references=batch["labels"]) metric.compute()
The result is as follows:
0%| | 0/2139 [00:00<?, ?it/s] 0%| | 0/270 [00:00<?, ?it/s] {'f1': 0.9335347432024169} {'f1': 0.9360090874668686} {'f1': 0.9274912756882513}
As you can see, we achieved decent performance using this approach. Keep in mind that the purpose of this blog is not to analyze performance on this specific dataset, but to learn how to use a pretrained body and add a custom head.
6. Conclusion
In this article we saw how to add custom layers on top of the HF pre-trained model.
Some takeaways:
- This technique is particularly useful in situations where we have a domain-specific dataset and want to leverage a model trained on the same domain (task-agnostic) to enhance performance on a small dataset.
- We can choose a model that has been trained on a downstream task different from our own and still use the knowledge of the model body.
- If your dataset is large and general enough, this may not be needed at all, in which case you can use
AutoModeForSequenceCecrification
or any other task solved usingBERT
. In fact, if that’s the case, I strongly recommend against building your own model header.
Original link: HF custom model head – BimAnt