Open source software FFmpeg generates models using image data sets

This article talks about the open source software ffmpeg, which has helped countless video software companies, countless online video websites, and countless CDN cloud service vendors.

Share how to use it to convert various video or movie files into tens of thousands of image data sets and wallpaper collections, so that the model program in the next article can be used for cooking. This method is particularly suitable for Hayao Miyazaki, “Wallpaper collection” movies like Makoto Shinkai’s.

Write in front

This article is one of the materials prepared as part of the sharing event “Using Vector Database to Quickly Build a Local Lightweight Image Search Engine” held by Machine Heart. Because the topic my friend gave me is “picture search engine”, so if we want to share this, we also need some interesting and meaningful picture data.

In fact, a year ago, I shared some practices about Milvus, one of which was “Vector Database Pitfalls: Using Docker and Milvus to quickly build a local lightweight image search engine.” But in this article, I used the first page of Genshin Impact cartoon wallpapers searched from the search engine. There are not many, only more than 60 wallpapers.

In order to reflect the performance of this image search engine, more images need to be added this time.

If you want to obtain a large number of images, if you write a spider script to “crawl” the content from the Internet, the efficiency will be relatively low, and it will be easy to introduce strange things without introducing an image security audit model. Content, this is not the result I want to see, and it is probably the result that the friend who invited me to share is afraid of it.

If you use the generative model mentioned in the previous article to create a data set, such as “Stable Diffusion” or “Midjourney”, the efficiency may not be guaranteed, because even if I use the 4090, which has a faster drawing speed, when I generate enough With the number of pictures I wanted, the event would probably have ended long ago. (Friendly help must also consider the cost, and you can’t bother with things like training and stacking cards, which will cause trouble to the organizer, haha)

So, using ffmpeg to extract pictures from a large number of movies, TV series, and short videos to form a data set as an idea for learning and experimentation may be a very cost-effective route.

Note: This article only discusses how to create valuable, high-quality image data sets through FFmpeg. It is recommended to watch movies and appreciate works produced by your favorite authors and actors. It is also recommended to use FFmpeg on appropriate occasions. Use the appropriate method, for example: take your girlfriend to the cinema with you, she watches the movie, and you watch her.

Of course, it is possible that it will look at you

The main reasons for using key frames in videos as a data set are:This type of data is relatively representative, has relatively high picture quality, and contains high-quality pictures of multiple categories. At present, the vast majority of Internet traffic is video. On “Bilibili” or various PT enthusiast websites, as well as various resource sites such as Baidu Cloud and Alibaba Cloud, it is difficult to obtain video resources. It is relatively low and the resources are relatively sufficient. For example, in this article we can use science fiction movies as an example, documentaries as an example, or TV series as an example.

For this post, I’ve chosen one of my favorite movies of the year: The Wandering Earth 2. (Highly recommended to watch the film museum, the experience is great)

The Wandering Earth 2

I converted this movie into 10,393 pictures based on one frame per second. Compared with the 60 pictures used for the first sharing, the order of magnitude increased by three levels. In fact, it is not much. In front of the database, this order of magnitude is still too small.

Picture and wallpaper data set

Of course, you can also play bigger and choose movies and TV series such as “Harry Potter Series”, “Friends”, and “Rush” that can convert more pictures as targets, and easily create ten Image data sets of tens of thousands and millions of levels.

Or you can write a program to use Makoto Shinkai and Hayao Miyazaki’s very exquisite “cartoon wallpaper” type animations, and randomly change a picture as the wallpaper every few seconds, which is also very fun.

Closer to home, let’s start to understand how to use ffmpeg to generate the data set and the details of the generation process.

FFmpeg installation

FFmpeg Open Source Project

FFmpeg is an open source software for all platforms, so it can be found on mobile phones and game consoles. But what we are mainly talking about today is converting videos into image data sets, so we will only look at the three major mainstream productivity platforms. :

In Ubuntu or macOS environment, we only need one command to complete the installation of FFmpeg.

# Ubuntu
apt-get install ffmpeg -y

#macOS
brew install ffmpeg

In the Windows operating system, you need to download and use the software package to install, just use the official download address.

If you prefer to use Docker, I recommend using the FFmpeg Docker image maintained by the LinuxServer organization. The image release page provides common usage examples, so I won’t go into details:

docker pull linuxserver/ffmpeg

Okay, since it is a program written in pure C, the installation is actually quite simple and there are no special dependencies.

Basic usage

If you want to convert a video into a picture every second and keep the picture at the same resolution as the video, you can use the following command:

ffmpeg -i The.Wandering.Earth.Ⅱ.mp4 ball-=.png

After the command starts executing, you can see the rolling log:

ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
  built with Apple clang version 15.0.0 (clang-1500.0.40.1)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/6.0_1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags=\ '-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libjxl -- enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora -- enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma -- enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable -libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox --enable-neon
  libavutil 58. 2.100 / 58. 2.100
  libavcodec 60.3.100/60.3.100
  libavformat 60. 3.100 / 60. 3.100
  libavdevice 60. 1.100 / 60. 1.100
  libavfilter 9. 3.100 / 9. 3.100
  libswscale 7. 1.100 / 7. 1.100
  libswresample 4. 10.100 / 4. 10.100
  libpostproc 57. 1.100 / 57. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'The.Wandering.Earth.Ⅱ.mp4':
  Metadata:
    major_brand :isom
    minor_version: 512
    compatible_brands: isomiso2mp41
    encoder: OurTV Multimedia Platform
  Duration: 02:53:11.54, start: 0.000000, bitrate: 6994 kb/s
  Stream #0:0[0x1](und): Video: hevc (Main 10) (hev1 / 0x31766568), yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 3840x1608, 6328 kb/s, 120 fps, 120 tbr, 90k tbn (default)
...
...
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> png (native))
Press [q] to stop, [?] for help
Output #0, image2, to 'ball-=.png':
  Metadata:
    major_brand :isom
    minor_version: 512
    compatible_brands: isomiso2mp41
    encoder: Lavf60.3.100
  Stream #0:0(und): Video: png, rgb48be(pc, gbr/bt2020/smpte2084, progressive), 3840x1608, q=2-31, 200 kb/s, 1 fps, 1 tbn (default)
...
frame= 116 fps= 15 q=-0.0 size=N/A time=00:00:00.95 bitrate=N/A speed=0.126x s/s speed=N/A
...

As the program runs, each frame of image named ball-001.png, ball-002.png, etc. will be generated in the target folder.

The above log will be heavily repeated, but there will be some important details in it that affect the efficiency of our data set generation work. One of them is: speed display status.

In the log above, speed=0.126x means that our performance of decoding and saving images is roughly half the normal playback speed. In other words, it takes nearly two minutes to process each minute of video. If the video we are processing is particularly long, then this “doubling” operation will make us very painful.

In addition, there is a very critical hidden detail: the hard drive footprint of the generated results.

We all know that the video file format is designed for “facilitating data circulation” and “relieving storage pressure”. It is analogous to a compressed package file and has a relatively high compression rate. If we intercept and save each frame of video separately, Faced with high frame rate 4K or 8K movies with dozens of frames in one second, our disk may be used up before the image data set is converted, because the size of each picture is more than 10M, and a 2 A full-hour movie will contain 180,000 pictures based on the lowest 25 frames, requiring nearly 2TB of storage space.

If it weren’t for a team like Director Guo Fan’s “Ball Series Movie” that didn’t need to go back and forth to review every frame of the work, then there might be no need to do this, as it would be too uneconomical and environmentally unfriendly.

Improve conversion performance: reduce the number of converted images

As mentioned above, in order for movies or various videos to look smooth, the frame rate value will be relatively high. But in fact, there is not much difference in the changes in the video picture, and secondly, the CPU and hard disk usage are really high during the saving process, and subsequent processing of these pictures also takes more time.

So, the first solution to improve conversion performance is to reduce the generation of unnecessary image data sets.

If you want to convert a video into a picture every second and keep the picture at the same resolution as the video, you can use the following command:

ffmpeg -i The.Wandering.Earth.Ⅱ.mp4 -r 1 ball-=.png

After the command starts executing, you can see the rolling log:

ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
  built with Apple clang version 15.0.0 (clang-1500.0.40.1)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/6.0_1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags=\ '-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libjxl -- enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora -- enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma -- enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable -libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox --enable-neon
  libavutil 58. 2.100 / 58. 2.100
  libavcodec 60.3.100/60.3.100
  libavformat 60. 3.100 / 60. 3.100
  libavdevice 60. 1.100 / 60. 1.100
  libavfilter 9. 3.100 / 9. 3.100
  libswscale 7. 1.100 / 7. 1.100
  libswresample 4. 10.100 / 4. 10.100
  libpostproc 57. 1.100 / 57. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'The.Wandering.Earth.Ⅱ.mp4':
  Metadata:
    major_brand :isom
    minor_version: 512
    compatible_brands: isomiso2mp41
    encoder: OurTV Multimedia Platform
  Duration: 02:53:11.54, start: 0.000000, bitrate: 6994 kb/s
  Stream #0:0[0x1](und): Video: hevc (Main 10) (hev1 / 0x31766568), yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 3840x1608, 6328 kb/s, 120 fps, 120 tbr, 90k tbn (default)
...
...
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> png (native))
Press [q] to stop, [?] for help
Output #0, image2, to 'ball-=.png':
  Metadata:
    major_brand :isom
    minor_version: 512
    compatible_brands: isomiso2mp41
    encoder: Lavf60.3.100
  Stream #0:0(und): Video: png, rgb48be(pc, gbr/bt2020/smpte2084, progressive), 3840x1608, q=2-31, 200 kb/s, 1 fps, 1 tbn (default)
...
frame= 7 fps=0.8 q=-0.0 size=N/A time=00:00:06.00 bitrate=N/A dup=0 drop=868 speed=0.656x =81 speed=N/A
...

Because only one image is stored per second, the part that does not require calculation is directly drop discarded, so it can be clearly seen that our processing speed has changed qualitatively, and the original “one fold” The processing speed has been increased to speed=0.656x, and the pressure on the hard disk and subsequent processing pressure can also be reduced a lot.

However, if we randomly choose a picture to look at, we will find that the size of the picture is still amazing.

A pretty big picture, or should I say every picture

This size will put a lot of unnecessary pressure on our subsequent processing programs and model programs when parsing images. Of course, if you are pursuing a super high-definition wallpaper data set, then this article has already covered everything you need. Next, you select the video material you like and prepare the wallpaper data set.

However, if you are like me and plan to use video materials to verify some model programs, or to implement proficiency verification similar to image search engines, you can also choose to continue optimizing the data set.

Improve conversion performance: reduce image calculation size

The resolution of the image we convert by default will remain the same as the video. In the example of this article, the default size I got is an image of 3840 × 1608.

Use the “elimination method” learned in junior high school and select the resolution of the picture to a level that you think is suitable. For example, you can choose to reduce the length and width of the picture by 12 times: 320×240, which can make me faster. The model program handles these pictures. Note that in the actual scene, you should adjust the image size to suit you.

ffmpeg -i The.Wandering.Earth.II.mp4 -r 1 -vf scale=320:240 ball-=.png

The command is very simple, just add -vf scale=resolution wide:resolution high before the output file.

Then you will find that the conversion speed has also been greatly enhanced:

frame= 14 fps=1.7 q=-0.0 Lsize=N/A time=00:00:13.00 bitrate=N/A dup=0 drop=1416 speed=1.55x 95 speed=N/A

Use GPU for conversion acceleration

The above operations are all performed in a CPU environment. Many cloud media-related services also operate in the same way: pure CPU computing.

The advantage of using CPU for computing is to save money. Compared with GPU, whether it is hardware cost, cooling cost, power consumption cost, or computer room construction requirements, the threshold is much lower.

However, if your laptop or you actually have a graphics card that can be used, then the processing speed of FFmpeg will be even higher. Of course, if the CPU itself is powerful enough, the meaning of using the GPU becomes to reduce the load on the CPU. The device can run more tasks at the same time (for example, the 13900kf + 4090 I use, the CPU is actually stronger in terms of conversion efficiency).

Of course, when using graphics cards and accelerator cards to do such calculations, we also need to confirm that our ffmpeg version supports this feature. You can use ffmpeg -hwaccels to see if it is supported. Hardware decoding acceleration:

# ffmpeg -hwaccels

ffmpeg version 6.0-6ubuntu1 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 13 (Ubuntu 13.2.0-2ubuntu1)
  configuration: --prefix=/usr --extra-version=6ubuntu1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu -- arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca -- enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm -- enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband -- enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab -- enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 -- enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl --enable-pocketsphinx --enable-librsvg --enable-libvpl -- disable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared
  libavutil 58. 2.100 / 58. 2.100
  libavcodec 60.3.100/60.3.100
  libavformat 60. 3.100 / 60. 3.100
  libavdevice 60. 1.100 / 60. 1.100
  libavfilter 9. 3.100 / 9. 3.100
  libswscale 7. 1.100 / 7. 1.100
  libswresample 4. 10.100 / 4. 10.100
  libpostproc 57. 1.100 / 57. 1.100
Hardware acceleration methods:
vpau
cuda
vaapi
qsv
drm
opencl
vulkan

The “Hardware acceleration methods” list in the log output indicates which acceleration methods are available.

It’s actually very simple to verify hardware acceleration. However, let’s first execute the command we have used above without adding any special parameters and the maximum calculation amount to collect a basic performance status:

ffmpeg -i The.Wandering.Earth.II.mp4 -r 1 -vf scale=320:240 ball-=.png

The processing speed is roughly like this:

frame= 17 fps=3.6 q=-0.0 Lsize=N/A time=00:00:16.00 bitrate=N/A dup=0 drop=1836 speed=3.38x

At this moment, the CPU is almost full. If the video resolution and frame rate processed are higher, this device should not be able to do anything else.

Next, let’s use hardware acceleration to process this video:

ffmpeg -hwaccel cuda -i The.Wandering.Earth.Ⅱ.mp4 -r 1 -vf scale=320:240 ball-=.png

Add -hwaccel acceleration mode to the parameters and run the command. You can see that the video will be processed at a faster speed:

frame= 101 fps=3.2 q=-0.0 Lsize=N/A time=00:01:40.00 bitrate=N/A dup=0 drop=11771 speed=3.17x

Of course, if you observe the system operation again, you can see that the system CPU operating load has been reduced a lot.

Finally

Okay, let’s stop talking about this article first. In the next article, I will continue to expand on vector database related content.

-EOF

We have a small tossing group, where we gather some friends who like tossing and are honest with each other.

We will talk about software and hardware, HomeLab, programming, life and workplace issues together, and occasionally share some technical information in the group from time to time.

Regarding the standards for making friends, please refer to the following article:

To new friends: Vote for life and keep looking for better friends

Of course, when adding friends through the following article, please note your real name, company or school, indicate the source and purpose, and cherish each other’s time

Things about tossing people into groups

This article uses the “Attribution 4.0 International (CC BY 4.0)” license agreement. You are welcome to reprint, re-modify and use it, but the source must be indicated. Attribution 4.0 International (CC BY 4.0)

Author of this article: Su Yang

Creation time: November 14, 2023
Statistical word count: 11057 words
Reading time: 23 minutes to read
Link to this article: https://soulteary.com/2023/11/14/ffmpeg-generates-models-using-image-datasets.html

syntaxbug.com © 2021 All Rights Reserved.