How to Precisely Trim Videos Using FFmpeg

?1. Problem description

1.1 Background

Previously, I did secondary development based on ffmpeg, completed common video processing functions, and used the ffmpeg command line as a cover. On this basis, a transcoding access and scheduling system is also provided to provide external services. There is a function that needs to be like this: quickly crop sub-videos of a certain time range from a specified video, two requirements: 1. Be fast, not as time-consuming as transcoding; 2. Be precise, you can specify when editing From which second to which second it ends.

1.2 Difficulties

It is easy to edit a small video from a long video with ffmpeg. For example, the command ffmpeg -i input.mp4 -ss 00:10:03 -t 00:03:00 -vcodec copy -acodec copy output.mp4 is from input.mp4 At the 10th minute and 03rd second, a 3-minute video is edited and saved as a output.mp4 file. The parameter -vcodec copy -acodec copy is to directly copy the audio and video stream of the original video without encoding and decoding. Although the above method is very convenient, there is a fatal flaw: The screen will freeze at the beginning (but the sound is always normal), and the screen will scroll normally after a few seconds. The video below is an example.

2. Cause analysis

The reason is that the start time of the clip falls in the middle of the video GOP instead of the first I frame. Students who have a little understanding of video coding should have heard of I, B, P frames. In simple terms, I frame is a complete image, P frame is differentially coded according to I frame, B The code> frame is differentially coded according to the preceding and following I, P, B frames. That is to say, the I frame has complete content, but the P and B frames do not, so if I is missing frame, then the P and B frames cannot be decoded normally. Generally speaking, the first frame in a GOP is the I frame, followed by several P and B frames . A GOP of up to 10 seconds is possible. The picture below is an information map of I, B, P frames of a real video, and the red ones represent I frames. You can see that the two I frames are far apart (actually 10 seconds apart).

From the above analysis, it can be seen that the start time of the clip may not fall in the I frame, because the lack of I frame will make the following P and The code>B frame cannot be decoded and the picture is stuck. The above analysis is based on direct copying of video content without encoding and decoding. If you consider decoding the images one by one, and then encoding the images that meet the time requirements, the editing time can be very accurate. But it takes too long to do so: it takes a lot of CPU to complete the encoding and decoding operations.

3. Solution

There is still a solution: encode and decode the first GOP that meets the time requirements, and copy the content of the subsequent GOP directly to the target video. First, because the first GOP frame is re-encoded, the I frame will be redistributed so that it can be played. Second, the subsequent GOP The content is copied directly, so the CPU is basically not consumed, and the performance is leveraged. As shown below:

Of course, there are still some pits in it, let’s start filling the pits.

3.1 Stitching

The source video may be surprised: Why can you decode the code I coded with my own skills? Generally speaking, decoding depends on SPS and PPS, and the SPS and PPS of the source video and the target video will have different, so a direct copy cannot be decoded correctly. For mp4 files, SPS and PPS are generally placed in the file header. A file can only have one file header, so two different SPS and PPS cannot be stored. In order to correctly decode the target video, the SPS and PPS of the source video must be obtained. If you can’t put the file header, where can you put it? Can it be placed in front of the copied frame? How to put it? At a loss, there is nowhere to start, until one day I suddenly remembered that in order to fill in a hole, I traced the implementation of h264_mp4toannexb, its function is to combine SPS and PPS code> is copied to the front of the frame (accurately, it should be AVPacket). Come! Review the specific implementation of h264_mp4toannexb: add 0x000001 or 0x00000001 in front of all AVPacket, insert in front of I frame SPS and PPS. That is, h264_mp4toexannb can correctly insert the SPS and PPS required for decoding into the video. h264_mp4toannexb is also relatively simple to use, the code is as follows:

AVBSFContext* initBSF(const std::string & amp;filter_name, const AVCodecParameters *codec_par, AVRational tb)
{
    const AVBitStreamFilter *filter = av_bsf_get_by_name(m_filter_name.c_str());
?
    AVBSFContext *bsf_ctx = nullptr;
    av_bsf_alloc(filter, & bsf_ctx);
?
    avcodec_parameters_copy(bsf_ctx->par_in, codec_par);
    bsf_ctx->time_base_in = tb;
?
    av_bsf_init(bsf_ctx);
    return bsf_ctx;
}
?
AVPacket* feedPacket(AVBSFContext *bsf_ctx, AVPacket & amp;packet)
{
    av_bsf_send_packet(bsf_ctx, packet);
?
    AVPacket *dst_packet = av_packet_alloc();
    av_bsf_receive_packet(bsf_ctx, dst_packet);
?
    return dst_packet;
}
?
void test()
{
    AVBSFContext *bsf_ctx = initBSF("h264_mp4toannexb", video_stream->codecpar, video_stream->time_base);
    AVPacket *packet = readVideoPacket();
    AVPacket *dst_packet = feedPacket(bsf_ctx, packet);
}

Note: When encoding and decoding the first GOP and the original video subsequent GOP, the time stamp should be handled carefully, otherwise the video may shake when playing.

3.2 Huaping

Think it’s over? No! ! You will find that some videos will be blurred at the last second. . . .

The reason for the blurry screen is actually not difficult to guess: the last frame is the B frame. Since the last frame of not all edited videos is a B frame, blurring is not a must. Knowing that it is caused by the B frame, the solution is clear: ensure that the last frame is a P frame. Even if slightly overtime (the audio stream should follow the video stream a little overtime). However, since it is not possible to directly determine whether a frame is a P frame from AVPacket, the last GOP must also be decoded (no encoding required). Record the pts of the first P frame after the time limit is exceeded. When copying GOP later, copy to this pts code> to stop.

4. Summary

At first, I thought it was difficult to solve the problem, after all, all the ffmpeg command lines were cropped had problems. But everything is always the same, starting from the cause of the problem, looking for a solution step by step, and breaking down the problems encountered along the way one by one. Remember, only by understanding the principle can the problem be solved.

How to use FFmpeg to accurately edit video – Programmer Sought

Business card at the end of the article can receive audio and video development learning materials for free, including (FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, srs) and audio and video learning roadmap, etc.

See below!↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓