lang-segment-anything local deployment

1. Introduction

Code address: https://github.com/luca-medeiros/lang-segment-anything

https://github.com/IDEA-Research/GroundingDINO

lang-segment-anything is an algorithm for segmenting objects in images based on language text prompts. It combines the two major algorithms of GroundingDINO and segment-anything. It has a good application scenario in semi-automatic annotation. It can classify objects without training. Segmentation is performed, that is, matching of text to image objects can be completed.

Note: After testing, lang-segment-anything was successfully deployed on the ubuntu18.04 system, but failed on the windows system. It is recommended to deploy on the ubuntu system.

2. Local deployment

Download 2.1 code and related files

In addition to downloading the code of lang-segment-anything, you also need to download some configuration files and weights of GroundingDINO and segment-anything.

First, go to the official github website to download all the files of lang-segment-anything. You can git clone or download the zip package.

Then download There are three options for the weight file required by segment-anything: l, b, h. Only the h with the best effect is downloaded here.

In addition, you also need to download the weight file and corresponding configuration file of GroundingDINO: there are two options: swinb and swint.

GroundingDINO uses the bert model. Because it is easy to fail to connect when downloading the model online, it is best to download it locally. Just download the following 5 files.

The above files can be obtained from the Baidu Netdisk link:

https://pan.baidu.com/s/1iqFjmTdJrja1ilSoxnWw6w?pwd=ek6i

After the download is completed, copy the three files to lang-segmengt-anything-main.

2.2 Environment Configuration

Basically follow the operation on github. Use conda to create a virtual environment from the yml file: conda env create -f environment.yml. Mainly install torch, segment-anything and groundingdino. Note that you need to modify the groundingdino in the environment.yml file to The groundingdino-py and lang-sam packages cannot be found and do not need to be installed. The installation of torch can be performed according to the following code:

pip install torch torchvision torchmetrics --index-url https://download.pytorch.org/whl/cu118

2.3 Code modification

First modify the lang-sam.py file under lang-sam, mainly adding the bulid_groundingdino_local and load_model_loacl methods to load the local groundingdino model. Note that the weight file and configuration file must correspond.

class LangSAM():

    def __init__(self, sam_type="vit_h", ckpt_path="sam_weight/sam_vit_h_4b8939.pth"):
        self.sam_type = sam_type
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.bulid_groundingdino_local()
        self.build_sam(ckpt_path)
    def bulid_groundingdino_local(self):
        ckpt_filename = "grounding_weight/groundingdino_swinb_cogcoor.pth"
        ckpt_config_filename = "grounding_weight/config/GroundingDINO_SwinB_cfg.py"
        self.groundingdino = load_model_loacl(ckpt_config_filename, ckpt_filename)

def load_model_loacl(model_config_path, model_checkpoint_path, device="cuda"):
    try:
        args = SLConfig.fromfile(model_config_path)
        args.device = device
        
        model = build_model(args)
        checkpoint = torch.load(model_checkpoint_path, map_location='cpu')
        model.load_state_dict(clean_state_dict(checkpoint["model"]), strict=False)
        model.eval()
        model.to(device)
     
    except Exception as e:
        print(str(e))
    return model

In addition, you also need to enter the groundingdino.util.get_tokenlizer installed by pip and make the following modifications to the code: mainly modifying the loading method of tokenizer and BertModel to offline loading. You can enter groundingdino.util.get_tokenlizer in the code, then hold down the ctrl key and click get_tokenlizer with the left mouse button to quickly jump to the code.

from transformers import AutoTokenizer, BertModel, BertTokenizer, RobertaModel, RobertaTokenizerFast
import os

def get_tokenlizer(text_encoder_type):
    if not isinstance(text_encoder_type, str):
        # print("text_encoder_type is not a str")
        if hasattr(text_encoder_type, "text_encoder_type"):
            text_encoder_type = text_encoder_type.text_encoder_type
        elif text_encoder_type.get("text_encoder_type", False):
            text_encoder_type = text_encoder_type.get("text_encoder_type")
        elif os.path.isdir(text_encoder_type) and os.path.exists(text_encoder_type):
            pass
        else:
            raise ValueError(
                "Unknown type of text_encoder_type: {}".format(type(text_encoder_type))
            )
    print("final text_encoder_type: {}".format(text_encoder_type))

    # tokenizer = AutoTokenizer.from_pretrained(text_encoder_type)
    tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
    return tokenizer


def get_pretrained_language_model(text_encoder_type):
    if text_encoder_type == "bert-base-uncased" or (os.path.isdir(text_encoder_type) and os.path.exists(text_encoder_type)):
        # return BertModel.from_pretrained(text_encoder_type)
        return BertModel.from_pretrained('bert-base-uncased')
    
    if text_encoder_type == "roberta-base":
        return RobertaModel.from_pretrained(text_encoder_type)

    raise ValueError("Unknown text_encoder_type {}".format(text_encoder_type))

2.4demo demonstration

When you run the following code, you will first be prompted whether to update lightning. You have to select No. The updated version will not work.

lightning run app app.py

After running the code, the following window will pop up on the web page. After uploading the image, you can segment the image with text prompts. The following text prompts you to enter person, and the person corresponding to person is segmented in the image.

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. OpenCV skill tree Home page Overview 24045 people are learning the system