Sort out the FFmpeg source code context based on the player process

background

A long time ago, I sorted out the source code context of FFmpeg in detail. At that time, I read Thor’s FFmpeg source code analysis blog post. Recently, I often use FFmpeg for encoding, decoding, decapsulation, etc., but I only remember a little about the internal context of FFmpeg. Structural design and its functions and API usage, so I prepared this blog. Simply analyzing the structural design of FFmpeg will be very boring, and combining the process of FFplay player to sort out the context of FFmpeg should have the effect of getting twice the result with half the effort.

I have little talent and knowledge, and the ffmpeg project is very complex and huge, and my understanding may be wrong, please correct me enthusiastically.

Player process combing

Here I divide the process of the player into 9 modules: file reading module, file format decapsulation module, video decoding module, audio decoding module, video stream format conversion module, audio stream resampling module, audio and video synchronization module , Video rendering and playback module, Audio playback module. The following briefly introduces the functions of these 9 modules from the perspective of the player.

Simple structure diagram of the player:

File reading module

The overall process of the player can be completely compared to the assembly line in the factory. The file reading module is the first process of the pipeline, and it has two functions.

Continuously read data from local/network
The format of the masked data passes the data stream to the next module

File format decapsulation module

The main functions of the unpack module are:

Parse the format information of the media file
Parse media stream information
Timestamp the media stream information

The determination of the format information of the media file and the media stream information usually reads part of the information of the file to determine the format of the media file and the format of the media stream. For example: FLV encapsulation format, the first 3 characters are “FLV”. It is also possible to infer the format information of the media file according to the suffix of the file name. The timestamp of the media stream information is also calculated in this module for synchronization between audio and video.

Video decoding module

The main function of the decoding module is to decode the video stream in H264/H265 format into YUV format.

Audio decoding module

The main function of the decoding module is to decode the video stream in AAC format into PCM format.

Video stream format conversion module

The YUV format output by the video decoding module may not be directly rendered on the display system, and needs to be converted into a format supported by the display system.

Audio stream resampling module

The main function of the resampling module is the same as that of the video format conversion module. Due to the limitations of the audio rendering module on the number of audio channels, sampling rate, and sampling format, it is also necessary to convert the audio format after audio decoding into a format supported by the audio rendering module.

Audio and video synchronization module

Ideally, there are no restrictions on devices and networks. Video is played at a specified frame rate, and audio is played at a sampling rate. They are synchronized by default; but this situation does not exist in reality, especially when playing network videos. At this time, due to the uncertainty of the network, rules must be added between audio and video to make them render synchronously.

The commonly used method is to play the audio normally according to its own sampling rate, number of channels, and sampling format, and save the timestamp of the current audio frame at the same time. When the video stream is rendered, by comparing the timestamp of the current frame with the timestamp of the audio, it is decided whether to speed up rendering, delay rendering or normal rendering.

Video rendering and playback module

The FFplay video rendering module uses SDL technology. It is a set of open source cross-platform multimedia development library. If you want to know the implementation of cross-platform rendering library, you can learn the SDL source code.

There are multiple ways to render video on the Android side, here I introduce 3 ways:

OpenGL rendering: through the operation of YUV -> RGB -> Texture, and then render the Texture to the screen through OpenGL operation.

NDK ANativeWindow technology: By converting the Surface of the Java layer to ANativeWindow in the NDK, rendering can be achieved by converting YUV -> RGB, and then copying the RGB data to the Buffer of ANativeWindow.

MediaCodec: By binding to MediaCodec to render Surface, MediaCodec will automatically complete the rendering after decoding.

Audio player module

The FFplay audio rendering module also uses SDL technology.

Audio rendering method on the Android side:

OpenSL

AudioTrack

AAudio

Summarize

The process of the player is probably the above process. The following chapters will explain some modules of the player in detail from the perspective of FFmpeg. The introduction of the next chapter is the focus of the whole article.

Analyze the player process from the perspective of FFmpeg

file reading module

FFmpeg internally divides the file reading module into 3 layers:

AVIOContext, URLContext

URLProtocol

file(FileContext), http(HttpContext), udp(UdpContext), tcp(TcpContext)

The bottom layer of the reading module is the operation of file, http, udp, and tcp. The operations of the bottom module will be abstracted into URLProtocol, and each bottom file corresponds to a URLProtocol. URLContext encapsulates URLProtocol in a layer, and related operations are simply transferred to call the supporting functions of the underlying specific files or protocols. AVIOContext is an extension of the function of URLContext, adding a buffer mechanism.

When we read and write files, the operation flow is: AVIOContext -> URLContext -> URLProtocol -> FileContext(file).

This module mainly realizes reading and writing of files through files.c, avio.c, aviobuf.c and other files in the ffmpeg libavformat directory.

URLProtocol ff_file_protocol = {

    .name = "file",

    .url_open = file_open,

    .url_read = file_read,

    .url_write = file_write,

    .url_seek = file_seek,

    .url_close = file_close,

    .url_get_file_handle = file_get_handle,

    .url_check = file_check,

    .url_delete = file_delete,

    .url_move = file_move,

    .priv_data_size = sizeof(FileContext),

    .priv_data_class = &file_class,

    .url_open_dir = file_open_dir,

    .url_read_dir = file_read_dir,

    .url_close_dir = file_close_dir,

};



URLProtocol ff_udp_protocol = {

    .name = "udp",

    .url_open = udp_open,

    .url_read = udp_read,

    .url_write = udp_write,

    .url_close = udp_close,

    .url_get_file_handle = udp_get_file_handle,

    .priv_data_size = sizeof(UDPContext),

    .priv_data_class = &udp_class,

    .flags = URL_PROTOCOL_FLAG_NETWORK,

};



URLProtocol ff_tcp_protocol = {

    .name = "tcp",

    .url_open = tcp_open,

    .url_accept = tcp_accept,

    .url_read = tcp_read,

    .url_write = tcp_write,

    .url_close = tcp_close,

    .url_get_file_handle = tcp_get_file_handle,

    .url_shutdown = tcp_shutdown,

    .priv_data_size = sizeof(TCPContext),

    .flags = URL_PROTOCOL_FLAG_NETWORK,

    .priv_data_class = &tcp_class,

};

FileContext, HttpContext, TcpContext represent these specific file protocol structures and related parameters.

typedef struct FileContext {

    const AVClass *class;

    int fd;

    int trunc;

    int blocksize;

#if HAVE_DIRENT_H

    DIR *dir;

#endif

} FileContext;



typedef struct TCPContext {

    const AVClass *class;

    int fd;

    int listen;

    int open_timeout;

    int rw_timeout;

    int listen_timeout;

} TCPContext;

File format unpacking module

FFmpeg internally divides the decapsulation module of the file format into 3 layers:

AVInputFormat

FLVContext, AVIContext, UDPContext, etc.

AVStream

Each file format in FFmpeg corresponds to an AVInputFormat. For example, the flv file format corresponds to the name of AVInputFormat is ff_live_flv_demuxer; the avi file format corresponds to the name of AVInputFormat is ff_avi_demuxer. AVInputFormat encapsulates function pointers for specific file format operations. For example, when we read a flv file, the real operation method is flv_read_packet respectively.

AVInputFormat ff_live_flv_demuxer = {

    .name = "live_flv",

    .long_name = NULL_IF_CONFIG_SMALL("live RTMP FLV (Flash Video)"),

    .priv_data_size = sizeof(FLVContext),

    .read_probe = live_flv_probe,

    .read_header = flv_read_header,

    .read_packet = flv_read_packet,

    .read_seek = flv_read_seek,

    .read_close = flv_read_close,

    .extensions = "flv",

    .priv_class = & amp; live_flv_class,

    .flags = AVFMT_TS_DISCONT

};



AVInputFormat ff_avi_demuxer = {

    .name = "avi",

    .long_name = NULL_IF_CONFIG_SMALL("AVI (Audio Video Interleaved)"),

    .priv_data_size = sizeof(AVIContext),

    .extensions = "avi",

    .read_probe = avi_probe,

    .read_header = avi_read_header,

    .read_packet = avi_read_packet,

    .read_close = avi_read_close,

    .read_seek = avi_read_seek,

    .priv_class = & demuxer_class,

};

FLVContext, AVIContext, and UDPContext represent the demultiplexing structure and related basic programs of these specific media.

typedef struct FLVContext {

    const AVClass *class; ///< Class for private options.

    int trust_metadata; ///< configure streams according to onMetaData

    int wrong_dts; ///< wrong dts due to negative cts

    uint8_t *new_extradata[FLV_STREAM_TYPE_NB];

    int new_extradata_size[FLV_STREAM_TYPE_NB];

    int last_sample_rate;

    int last_channels;

    struct {

        int64_t dts;

        int64_t pos;

    } validate_index[2];

    int validate_next;

    int validate_count;

    int searched_for_end;

} FLVContext;



typedef struct AVIContext {

    const AVClass *class;

    int64_t riff_end;

    int64_t movi_end;

    int64_t fsize;

    int64_t io_fsize;

    int64_t movi_list;

    int64_t last_pkt_pos;

    int index_loaded;

    int is_odml;

    int non_interleaved;

    int stream_index;

    DVDemuxContext *dv_demux;

    int odml_depth;

    int use_odml;

#define MAX_ODML_DEPTH 1000

    int64_t dts_max;

} AVIContext;



typedef struct TCPContext {

    const AVClass *class;

    int fd;

    int listen;

    int open_timeout;

    int rw_timeout;

    int listen_timeout;

} TCPContext;

The relationship between AVInputFormat and the specific demultiplexing structure is associated with iformat and priv_data in AVFormatContext.

AVInputFormat and the specific audio and video encoding algorithm format are associated with the media format through the streams field in the AVFormatContext structure. The demultiplexing module separates the audio and video raw data, and the information of these media compression formats is stored in the streams. Pass it to the lower-level audio and video decoder through streams.

This module mainly implements file format and media information analysis through flvdec.c, avidec.c, utils_format.c and other files in the ffmpeg libavformat directory.

decoding module

FFmpeg internally divides the decoding module into 3 layers:

AVCodecContext

AVCodec

H264Context, HEVCContext, etc.

Each codec in FFmpeg corresponds to an AVCodec. For example, the name of the AVCodec corresponding to the H264 decoder is ff_h264_decoder; the name of the AVCodec corresponding to the H265 decoder is ff_hevc_decoder; AVCodec encapsulates the function pointer of the specific codec operation. For example, when we decode an H264 video stream, we call decode through AVCodec, but the real operation method is the h264_decode_frame method.

AVCodec ff_h264_decoder = {

    .name = "h264",

    .long_name = NULL_IF_CONFIG_SMALL("H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10"),

    .type = AVMEDIA_TYPE_VIDEO,

    .id = AV_CODEC_ID_H264,

    .priv_data_size = sizeof(H264Context),

    .init = ff_h264_decode_init,

    .close = h264_decode_end,

    .decode = h264_decode_frame,

    .capabilities = /*AV_CODEC_CAP_DRAW_HORIZ_BAND |*/ AV_CODEC_CAP_DR1|

                             AV_CODEC_CAP_DELAY | AV_CODEC_CAP_SLICE_THREADS |

                             AV_CODEC_CAP_FRAME_THREADS,

    .flush = flush_dpb,

    .init_thread_copy = ONLY_IF_THREADS_ENABLED(decode_init_thread_copy),

    .update_thread_context = ONLY_IF_THREADS_ENABLED(ff_h264_update_thread_context),

    .profiles = NULL_IF_CONFIG_SMALL(profiles),

    .priv_class = &h264_class,

};



AVCodec ff_hevc_decoder = {

    .name = "hevc",

    .long_name = NULL_IF_CONFIG_SMALL("HEVC (High Efficiency Video Coding)"),

    .type = AVMEDIA_TYPE_VIDEO,

    .id = AV_CODEC_ID_HEVC,

    .priv_data_size = sizeof(HEVCContext),

    .priv_class = &hevc_decoder_class,

    .init = hevc_decode_init,

    .close = hevc_decode_free,

    .decode = hevc_decode_frame,

    .flush = hevc_decode_flush,

    .update_thread_context = hevc_update_thread_context,

    .init_thread_copy = hevc_init_thread_copy,

    .capabilities = AV_CODEC_CAP_DR1 | AV_CODEC_CAP_DELAY |

                             AV_CODEC_CAP_SLICE_THREADS | AV_CODEC_CAP_FRAME_THREADS,

    .profiles = NULL_IF_CONFIG_SMALL(profiles),

};

H264Context and HEVCContext represent the parameter structure and related basic programs used internally by these specific codecs. Since the H264 and H265 codecs require too many parameters, no copying will be performed here.

The relationship between AVCodecContext and specific encoders is through the association of AVCodec *codec and priv_data in AVCodecContext. priv_data represents the Context corresponding to AVCodec. Such as H264Context, HEVCContext, etc.

This module mainly realizes the decoding of video stream through h264.h, h264.c, hevc.h, hevc.c and other files in the ffmpeg libavcodec directory.

Video stream format conversion module

The video stream format conversion module provided by FFmpeg is too inefficient. Many projects use the libyuv library for video format conversion. I won’t explain it here. Those who are interested in libyuv can understand it by themselves.

Audio stream resampling module

The resampling of the audio stream can be performed using SwrContext, which will not be explained here. It is the use of API, without much structural design.

Audio and video rendering module

Audio and video rendering is basically implemented using platform-related API operations or cross-platform solutions, and has no connection with the FFmpeg engine. The specific operation methods have been roughly introduced above, and the detailed usage methods can be understood by yourself.

Here is a specific FFMpeg operation flowchart drawn by Raytheon when decoding FFmpeg, as a memory. Although many changes have taken place in the API of FFmpeg, the specific structure inside remains unchanged. After learning Thor’s FFmpeg process analysis, and then looking at the new FFmpeg source code, you will get started very quickly.

In order to deepen the understanding of FFmpeg source code, here are 3 thinking questions. After analyzing these three thinking questions, you will fully understand the code structure of FFmpeg from compilation to runtime, and you will be very clear about how to do FFmpeg compilation and source code modification.

think

1. When FFmpeg compiles and cuts, you need to specify the required protocols (Protocol), encapsulation format (Muxer), decapsulation format (DeMuxer), and encoding format (Codec). Unnecessary ones will be cut. How are these formulated protocols and formats loaded and found when FFmpeg is running?

2. How does FFmpeg connect AvFormatContext, AVIOContext, URLContext, URLProtocol, AVInputFormat, AVCodecContext, AVCodec?

When we operate a file through FFmpeg, what is the calling sequence between modules and Context in FFmpeg?

Link to the original text: Sorting FFmpeg source code context based on player process_Sorting player data stream_BetterDaZhang’s Blog-CSDN Blog

Business card at the end of the article can receive audio and video development learning materials for free, including (FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, srs) and audio and video learning roadmap, etc.

See below!↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓