Label Studio combines YOLO model to realize automatic labeling of data sets

Recently I have been trying to do model training work related to image target detection, but if you want to achieve the training of a model, the data set is crucial, and the production of the target detection data set is more complicated than image classification, so a simple and easy-to-use data set is needed. After comparing a bunch of tools on the Internet, I finally chose label studio, mainly for the following reasons:

  • Data security: It is expected that the labeled data can only be used internally and does not want to be exposed to the outside world.
  • Convenience: Annotation can be implemented by deploying a web service, so that other students can also annotate together later.
  • Automatic/semi-automatic annotation: The trained model can be combined to assist annotation to achieve automatic/semi-automatic annotation, and the target detection effect of the trained model can also be visualized and verified. Finally, it was found that label studio meets the above requirements, so the Label Studio labeling tool was built, and combined with the YOLO model to achieve automatic/semi-automatic labeling of the data set, the entire process is now recorded.

Environment setup

Basic environment setup

css
Copy code
> conda create -n label-studio python=3.10
> conda activate label-studio

Front-end service

shell
Copy code
> cd /data/code/github/label-studio
# Run database migrations
> python label_studio/manage.py migrate
> python label_studio/manage.py collectstatic
# Start the server in development mode at http://localhost:8080
python label_studio/manage.py runserver
> python label_studio/server.py --host=0.0.0.0 --port=8081

The data saving path is as follows:

javascript
Copy code
~/.local/share/label-studio/

As follows

yaml
Copy code
(label-studio) # ls -lrt ~/.local/share/label-studio/
total 1212
drwxr-xr-x 2 xxxx xxxx 4096 Oct 20 10:36 test_data
drwxr-xr-x 4 xxxx xxxx 4096 Oct 20 11:02 media
drwxr-xr-x 2 xxxx xxxx 4096 Oct 20 12:05 export
-rw-r--r-- 1 xxxx xxxx 1228800 Oct 20 15:30 label_studio.sqlite3
  • Project interfaceProject interface\
  • Create a new project
  • basic settings

The type of labeled objects can be set in Labeling Interface.

  • Backend inference

Machine Learning can set up a back-end inference service to achieve automated annotation. You can refer to label-studio-ml-backend. If Yolov8 is used for detection, you can refer to label-studio-yolov8-backend. If combined with SAM for detection, Can refer to Split Everything? Teach you step by step how to deploy SAM + LabelStudio to achieve automatic labeling

Retrieve predictions when loading a task automatically: The imported image will be automatically called to the prediction service

  • Final effect

The numbers below are the shortcut keys for the corresponding types

Annotation results can be cleaned

  • Delete Tasks: Delete pictures and tasks
  • Delete Annotations: Delete manual annotation results
  • Delete Predictions: To delete the prediction results, you need to connect to the back-end prediction service. If you are not satisfied with the prediction results and want to re-predict, you can delete the prediction results and regenerate the prediction results.

The annotated data code is as follows:

json
Copy code

{
  "id": 50,
  "data": {
    "image": "/data/upload/7/437ac54a-csm_eb24002_58f4fb61dd.jpg"
  },
  "annotations": [
    {
      "id": 5,
      "created_username": " [email protected], 2",
      "created_ago": "0 minutes",
      "completed_by": {
        "id": 2,
        "first_name": "",
        "last_name": "",
        "avatar": null,
        "email": "[email protected]",
        "initials": "li"
      },
      "result": [
        {
          "original_width": 1366,
          "original_height": 768,
          "image_rotation": 0,
          "value": {
            "x": 10.10689990281827,
            "y": 27.483448736637506,
            "width": 23.420796890184647,
            "height": 37.68170958859735,
            "rotation": 0,
            "rectanglelabels": [
              "Car"
            ]
          },
          "id": "8oMV4M5avz",
          "from_name": "label",
          "to_name": "image",
          "type": "rectanglelabels",
          "origin": "manual"
        }
      ],
      "was_cancelled": false,
      "ground_truth": false,
      "created_at": "2023-10-22T09:21:27.660411Z",
      "updated_at": "2023-10-22T09:21:27.660435Z",
      "draft_created_at": "2023-10-22T09:19:22.672990Z",
      "lead_time": 114.60199999999999,
      "import_id": null,
      "last_action": null,
      "task": 50,
      "project": 7,
      "updated_by": 2,
      "parent_prediction": null,
      "parent_annotation": null,
      "last_created_by": null
    }
  ],
  "predictions": []
}

Backend prediction service

Backend prediction services can be deployed to implement automatic annotation

shell
Copy code
> git clone https://github.com/HumanSignal/label-studio-ml-backend.git
> cd label-studio-ui-element-backend
> pip install -e .
# Environment verification
> label-studio-ml create env_verify
> label-studio-ml start env_verify -p 9091
  • Create a formal environment
css
Copy code
> label-studio-ml create ui_element
> label-studio-ml start ui_element -p 9091
> nohup label-studio-ml start ui_element -p 9091 > ./ui_element/label.log 2> & amp;1 & amp;
> nohup label-studio-ml start ui_element_sam -p 9092 > ./ui_element_sam/label.log 2> & amp;1 & amp;

Export annotation results

All annotation results will be exported, and different annotation formats can be exported. For example, if used for YOLO training, the YOLO format can generally be used. Of course, there are also COCO and VOCdata format

Currently, there is no way to export only the annotation results of the selected images. For example, the exported YOLO data format is as follows:

  • images: Annotated pictures

  • labels: label data, saved with txt suffix, each line represents a piece of label data, containing 5 columns

    • Column 1: The marked category is consistent with the category index in classex.txt below
    • Column 2: x-coordinate of the center point of the marked rectangular frame (normalized: coordinate point/picture width)
    • Column 3: The y coordinate of the center point of the marked rectangular frame (normalized: coordinate point/picture height)
    • Column 4: Width of the labeled rectangular box (normalized: width/picture width)
    • Column 5: Height of the labeled rectangular box (normalized: height/picture height)
Copy code
0 0.7914317925591883 0.3476714942329373 0.3066516347237879 0.1781053970456451
0 0.7931228861330326 0.5379758910762293 0.30552423900789166 0.16834619720752755
0 0.7931228861330328 0.7221807880206979 0.30777903043968435 0.1756655970861157
  • classes.txt: Save the annotated type list

Digression

In this era of rapidly growing technology, programming is like a ticket to a world of infinite possibilities for many people. Among the star lineup of programming languages, Python is like the dominant superstar. With its concise and easy-to-understand syntax and powerful functions, Python stands out and becomes one of the hottest programming languages in the world.


The rapid rise of Python is extremely beneficial to the entire industry, but “There are many popular people and not many people“, which has led to a lot of criticism, but it still cannot stop its popularity. development momentum.

If you are interested in Python and want to learn Python, here I would like to share with you a Complete set of Python learning materials, which I compiled during my own study. I hope it can help you, let’s work together!

Friends in need can click the link below to get it for free or Scan the QR code below to get it for free

CSDN Gift Package: The most complete “Python Learning Materials” on the entire network for free sharing(safe Link, feel free to click)

?

1Getting started with zero basics

① Learning route

For students who have never been exposed to Python, we have prepared a detailed Learning and Growth Roadmap for you. It can be said to be the most scientific and systematic learning route. You can follow the above knowledge points to find corresponding learning resources to ensure that you learn more comprehensively.

② Route corresponding learning video

There are also many learning videos suitable for beginners. With these videos, you can easily get started with Python~

③Exercise questions

After each video lesson, there are corresponding exercises to test your learning results haha!

2Domestic and foreign Python books and documents

① Documents and books

3Python toolkit + project source code collection

①Python toolkit

The commonly used development software for learning Python is here! Each one has a detailed installation tutorial to ensure you can install it successfully!

②Python practical case

Optical theory is useless. You must learn to type code along with it and practice it in order to apply what you have learned to practice. At this time, you can learn from some practical cases. 100+ practical case source codes are waiting for you!

③Python mini game source code

If you feel that the practical cases above are a bit boring, you can try writing your own mini-game in Python to add a little fun to your learning process!

4Python interview questions

After we learn Python, we can go out and find a job if we have the skills! The following interview questions are all from first-tier Internet companies such as Alibaba, Tencent, Byte, etc., and Alibaba bosses have given authoritative answers. I believe everyone can find a satisfactory job after reviewing this set of interview materials.

5Python part-time channels

Moreover, after learning Python, you can also take orders and make money on major part-time platforms. I have compiled various part-time channels + part-time precautions + how to communicate with customers into documents.

All the above information , if friends need it, you can scan the QR code below to get it for free
?