[Audio and Video | Ogg] RFC3533: The Ogg Encapsulation Format Version 0 (The Ogg Encapsulation Format Version 0)

Blog homepage:https://blog.csdn.net/wkd_007
Blog content:Embedded development, Linux, C language, C++, data structure, audio and video
Content of this article:Introduction to Ogg packaging format
Golden sentence sharing:Confucius said: A gentleman is magnanimous, but a villain has long-term relationships. –“The Analects of Confucius·Shuerpian”. It means that a gentleman is open-minded and calm; a villain is preoccupied and worried about gains and losses.

Directory

  • 1. Introduction
  • 2. Definition
  • 3. Requirements for universal packaging format
  • 4. Ogg bit stream format
  • 5. Packaging process
  • 6. Ogg page format

1. Introduction

The Ogg bitstream format was developed as part of a larger project to create a set of components for encoding and decoding multimedia content (codecs) that will be freely available and available in software and hardware Reimplemented for use by the broad computing community, including the Internet community. The intention of Ogg developers, represented by Xiph.Org, is that it can be used without intellectual property issues.

This document describes the Ogg bitstream format and how it can be used to encapsulate one or several media bitstreams created by one or more encoders. The Ogg transport bitstream is designed to provide framing, error protection and lookup structures for advanced codec streams consisting of raw, unencapsulated packets, such as the Vorbis audio codec or the upcoming Tarkin and Theora video codecs. It is capable of interleaving different binary media and other time-continuous data streams that are prepared by the encoder as a sequence of packets. Ogg provides enough information to properly separate data back into the data packets created by such an encoder at the original packet boundaries without relying on decoding to find the packet boundaries.

Please note that the MIME type application/ogg is registered with IANA.

2. Definition

To describe the Ogg packaging process, a set of terms will be used, the meaning of which needs to be fully understood. Therefore, some of the most basic terms have now been defined before we start describing the requirements of the generic media stream encapsulation format, the encapsulation process, and the specific format of the Ogg bitstream. For a more complete glossary, see the appendix.

The result of Ogg encapsulation is called the “physical (Ogg) bitstream”. It encapsulates a bitstream created by one or more encoders, called a “logical bitstream”. The logical bitstream provided to the Ogg encapsulation process has a structure in that it is split into a series of so-called “packets”. Packets are created by the encoder of this logical bitstream and represent meaningful entities only for that encoder (e.g. an uncompressed stream may use video frames as packets). They contain no boundary information – strung together, they look like a stream of random bytes with no landmarks.

Please note that the term “packet” in this document is not used to refer to an entity transmitted over a network.

3. Requirements for universal packaging format

The design idea behind Ogg is to provide a universal linear media transmission format to enable file-based storage and stream-based transmission of one or more interleaved media streams independent of the encoding format of the media data. This packaging format requires providing:

  • Framing of logical bit streams.
  • Interleaving of different logical bit streams.
  • Corruption detection. ( detection of corruption. )
  • Recapture after parsing error
  • Flag for direct random access to any position in the bitstream.
  • Streaming capabilities (i.e. no seeking required to build a 100% complete bitstream).
  • Small overhead (i.e., using no more than approximately 1-2% of the bitstream bandwidth for packet boundary marking, advanced framing, synchronization, and lookups).
  • Simplicity for fast parsing.
  • A simple concatenation mechanism for several physical bitstreams.

Ogg takes all of these design factors into consideration. Ogg supports framing and interleaving of logical bitstreams, finding landmarks, detecting corruption, and resynchronizing streams after parsing errors with no more than 1-2% overhead. It is a general framework for encapsulating time-continuous bit streams. It does not know any details of the codec data it encapsulates and is therefore independent of any media codec.

4. Ogg bitstream format

A physical Ogg bitstream consists of multiple logical bitstreams interleaved in so-called “pages”. Fetch entire pages sequentially from multiple logical bitstreams multiplexed at the page level. A logical bitstream is identified by a unique sequence number in the header of each page of the physical bitstream. This unique sequence number is created randomly and has no connection to the content or encoder of the logical bitstream it represents. All pages of the logical bitstream are interleaved concurrently, but they do not need to be in regular order – they just need to be contiguous within the logical bitstream. Ogg demultiplexing reconstructs the original logical bitstream from the physical bitstream by sequentially fetching pages from the physical bitstream and redirecting them to the appropriate logical decoding entities.

Each Ogg page contains only one type of data because it belongs to only one logical bit stream. Pages are of variable size and have headers containing packaging and error recovery information. Each logical bitstream in the physical Ogg bitstream starts with a special start page (bos=beginning of stream) and ends with a specific page (eos=end of stream).

The bos page contains information that uniquely identifies the codec type and may contain information to set up the decoding process. The BOS page should also contain information about the encoding media – for example, for audio, it should include the sample rate and number of channels. By convention, the first byte of the bos page contains magic data that uniquely identifies the required codec. It is the responsibility of anyone deploying a new codec to ensure that his/her codec can be reliably distinguished from all other codecs in use. There is no fixed way to detect the end of a codec identification mark. The format of the bos page depends on the codec and must therefore be given in the packaging specification for that logical bitstream type. Ogg also allows, but does not require, auxiliary header packets after the logical bitstream’s BOS page, and these header packets must also precede any data packets in any logical bitstream. These subsequent header packets are added to the full number of pages, which will not contain any data packets. Therefore, the physical bitstream starts with a bos page containing an initial header packet for each page of all logical bitstreams, followed by sub-header packets for all streams, and then pages containing data packets.

The encapsulation specification of one or more logical bitstreams is called a “media map”. An example of a media map is “Ogg Vorbis”, which uses the Ogg framework to encapsulate Vorbis-encoded audio data for stream-based storage (such as files) and transmission (such as TCP streams or pipes). Ogg Vorbis provides the name and version of the Vorbis codec, audio rate and audio quality on the Ogg Vorpis bos page. It also uses two additional header pages for each logical bit stream. Ogg Vorbis-bos pages start with byte 0x01, followed by “Vorbis” (a total of 7 bytes for the identifier).

Ogg knows about two types of multiplexing: concurrent multiplexing (called “Grouping”) and sequential multiplexing (called “Chaining”). Grouping defines how to interleave multiple logical bit streams page by page in the same physical bit stream. For example, interleaving a video stream with several synchronized audio tracks using different codecs in different logical bitstreams requires grouping. Chaining, on the other hand, is defined as providing a simple mechanism to join physical Ogg bitstreams, which is often required by streaming applications.

In grouping, all bos pages of all logical bit streams must appear together at the beginning of the Ogg bit stream. The media map specifies the order of the initial pages. For example, grouping of specific Ogg video and Ogg audio bitstreams may specify that the physical bitstream must start with the BOS page of the logical video bitstream, followed by the audio bitstream. Unlike BOS pages, EOS pages of logical bit streams do not need to all appear consecutively. EOS pages can be “nil” pages, that is, pages that contain no content but a header with location information and the eos flag set in the header. Each grouped logical bitstream must have a unique serial number within the scope of the physical bitstream.

In Chaining, complete logical bit streams are joined together. The bitstreams do not overlap, i.e. the eos page of a given logical bitstream is immediately followed by the next logical bitstream. Each linked logical bitstream must have a unique serial number within the scope of the physical bitstream.

Groups of parallel multiplexed bit streams can be concatenated continuously. When unbound, these groups must exist independently as valid concurrent multiplexed bitstreams. The figure below shows a schematic example of such a physical bitstream that obeys all the rules for grouping and chaining multiplexed bitstreams.

 physical bitstream with pages of
          different logical bitstreams grouped and chained
      -------------------------------------------------- ----------
      |*A*|*B*|*C*|A|A|C|B|A|B|#A#|C|...|B|C|#B#|#C#|*D*|D| ...|#D#|
      -------------------------------------------------- ----------
       bos bos bos eos eos eos bos eos

In this example, there are two linked physical bit streams, the first being a packet stream consisting of three logical bit streams A, B and C. The second physical bitstream D is linked after the end of the packetized bitstream, which ends after the last eos page of all its packetized logical bitstreams. As can be seen, the grouped bitstreams start together – all BOS pages must appear before any data pages. It can also be seen that the pages of a parallel multiplexed bitstream do not need to follow a regular order. As can be seen, a grouped bitstream can end long before the other bitstreams in the group end.

Ogg doesn’t know any details about the codec data, except that each logical bitstream belongs to a different codec, and the data from the codecs is arranged in order and has position markers (so-called “granule positions”). Ogg has no concept of “time”: it only knows sequentially increasing, unitless position markers. Applications only have access to timing information through higher layers, which have access to codec APIs to allocate and convert particle positions or times.

Specific definitions of media maps using Ogg can impose further constraints on their specific use of the Ogg bitstream format. For example, a specific media map may require that all EOS pages of all packet bitstreams need to appear in direct order. An example of a media map is the “Ogg Vorbis” specification. Another example is the upcoming “Ogg Theora” specification, which encapsulates Theora-encoded video data, often multiplexed with Ogg’s Vorbis streams containing synchronized audio and video. Since Ogg does not specify the temporal relationship between encapsulated concurrent multiplexed bitstreams, temporary synchronization between the audio and video streams will be specified in this media map. To enable streaming, pages from the various logical bitstreams will typically be chronologically interleaved.

V. Packaging process

The process of multiplexing different logical bitstreams occurs at the page level as described above. However, the bitstream provided by the encoder is handed over to Ogg as so-called “packets”, with packet boundaries depending on the encoding format. The process of encapsulating data packets into pages will now be described.

From Ogg’s perspective, packets can be of any size. A specific media map will define how packets from a specific media encoder are grouped or decomposed. Since the maximum size of Ogg pages is approximately 64kBytes, sometimes a packet must be spread over several pages. To simplify this process, Ogg divides each packet into 255-byte long chunks and a final shorter chunk. These chunks are called “Ogg Segments”. They are just a logical structure and have no segmented headers themselves.

A set of contiguous segments is packed into a variable-length page preceded by a page header. The segment table in the page header tells the “Lacing Value” (size) of each segment contained in the page. Flags in the page header tell whether a page contains packets that continue from the previous page. Note that a lacing value of 255 means that a second lacing value follows in the same packet, and a lacing value less than 255 marks the end of a packet after many additional bytes. Packets of 255 bytes (or a multiple of 255 bytes) are terminated with a lacing value of 0. Also note that a “nil” (zero-length) packet is not an error; it simply contains a lacing value of zero in the page header.

The encoding is optimized for speed and the expectation that most packets will be between 50 and 200 bytes. This is a design reason, not a suggestion. This encoding avoids imposing both a maximum packet size and a minimum overhead on small packets. In contrast, for example, simply using two bytes at the beginning of each packet, and having a maximum packet size of 32kBytes, will always penalize small packets (typically <255 bytes) with twice the fragmentation overhead ). Using the recommended lacing values, small packets see the smallest possible byte alignment overhead (1 byte), while large packets (>512 bytes) see a fairly constant ~0.5% overhead on encoding space.

 The following diagram shows a schematic example of a media mapping
   using Ogg and grouped logical bitstreams:

          logical bitstream with packet boundaries
 -------------------------------------------------- ---------------
 > | packet_1 | packet_2 | packet_3 | <
 -------------------------------------------------- ---------------

                     |segmentation (logically only)
                     v

      packet_1 (5 segments) packet_2 (4 segs) p_3 (2 segs)
     ---------------------------------- ------------------ ----------
 .. |seg_1|seg_2|seg_3|seg_4|s_5 | |seg_1|seg_2|seg_3|| |seg_1|s_2 | ..
     ---------------------------------- ------------------ ----------

                     | page encapsulation
                     v

 page_1 (packet_1 data) page_2 (pket_1 data) page_3 (packet_2 data)
----------------------------------------------------------------- ---------------
|H|---------- | |H|---------- | |H|--------- ---------- |
|D||seg_1|seg_2|seg_3| | |D|seg_4|s_5 | | |D||seg_1|seg_2|seg_3| | ...
|R|---------- | |R|---------- | |R|--------- ---------- |
----------------------------------------------------------------- ---------------

                    |
pages of |
other --------| |
logical -------
bitstreams | MUX |
                -------
                   |
                   v

              page_1 page_2 page_3
      ------ ------ ------- ----- -------
 ... || | || | || | || | || | ...
      ------ ------ ------- ----- -------
              physical Ogg bitstream

In this example, we take a snapshot of the encapsulation process of a logical bitstream. We can see that part of the bitstream provided by the codec is broken down into packets. The Ogg encapsulation process divides the data packet into several segments. The packet in this example is quite large, so packet 1 is divided into 5 segments, 4 of which are 255 bytes and the last one is smaller. Packet 2 is divided into 4 segments, 3 of which are 255 bytes and the last one is very small. Packet 3 is divided into two segments. The encapsulation process then creates the page, which in this case is very small. Page 1 consists of the first three segments of Packet 1, Page 2 contains the remaining two segments from Packet 1, and Page 3 contains the first three pages of Packet 2. Finally, the logical bitstream pages are mixed with other logical bitstream pages to form a physical Ogg bitstream.

6. Ogg page format

The physical Ogg bitstream consists of a series of connected pages. Page size is variable, typically 4-8 kB, with a maximum of 65307 bytes. The header contains all the information needed to demux the logical bitstream from the physical bitstream as well as perform basic error recovery and landmarks for lookups. Each page is a self-contained entity, so the page decoding mechanism can identify, verify, and process a single page at a time without requiring the entire bitstream.

The Ogg page header has the following format:

 0 1 2 3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
 + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
| capture_pattern: Magic number for page start "OggS" | 0-3
 + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
| version | header_type | granule_position | 4-7
 + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
| | 8-11
 + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
| | bitstream_serial_number | 12-15
 + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
| | page_sequence_number | 16-19
 + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
| | CRC_checksum | 20-23
 + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
| |page_segments | segment_table | 24-27
 + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
| ... | 28-
 + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +

The LSb (least significant bit) comes first in the Bytes. Fields
with more than one byte length are encoded LSB (least significant
byte) first.

The LSb (least significant bit) is the first bit in the byte. Fields longer than one byte are encoded LSB (least significant byte) first.

The fields in the page header have the following meanings:

1. capture_pattern: A 4-byte field indicating the beginning of the page. It contains 4 characters: O, g, g, S. It helps the decoder find page boundaries and regain synchronization after parsing a corrupted stream. Once a capture pattern is discovered, the decoder verifies page synchronization and integrity by calculating and comparing checksums.

2. stream_structure_version: 1 byte, indicating the version number of the Ogg file format used in the stream (this document specifies version 0).

3. header_type_flag: The bits in this 1-byte field identify the specific type of the page.

bit 0x01
Set: The page contains data from the packet continued from the previous page.
Not set: The page contains new packets

bit 0x02
is set: This is the first page of the logical bitstream (bos)
Not set: This page is not the homepage

bit 0x04
is set: This is the last page of the logical bitstream (eos)
Not set: This page is not the last page

4. granule_position: 8-byte field containing position information. For example, for an audio stream, it might contain the total number of PCM samples encoded after including all frames completed on this page. For video streams, it may contain the total number of video frames encoded after this page. This is a hint to the decoder and gives it some timing and position information. Its meaning depends on the codec of this logical bitstream and is specified in the specific media map. The special value -1 (in 2’s complement) indicates that no packet ends on this page.

5. bitstream_serial_number: A 4-byte field containing a unique serial number through which the logical bitstream is identified.

6. page_sequence_number: A 4-byte field containing the page sequence number, allowing the decoder to identify page loss. This sequence number is incremented individually on each logical bitstream.

7. CRC_checksum: 4-byte field containing the 32-bit CRC checksum of the page (including header and page content with zero CRC field). The generating polynomial is 0x04c11db7.

8. number_page_segments: 1 byte, giving the number of segment entries encoded in the segment table.

9. segment_table: The size is number_page_segments bytes. Contains the lacing value of all segments in this page. Each byte contains a lacing value.

The page header size in bytes (total header size) is given by:

header_size = number_page_segments + 27 [Byte]

The total page size in bytes is given by: page header size + sum of all lacing_values values

page_size = header_size + sum(lacing_values: 1..number_page_segments)[Byte]


If the article is helpful, please like it, collect it, support it, thank you

References:
The Ogg Encapsulation Format Version 0