This is the behind-the-scenes process that ensures seamless delivery of video content to your device. It involves a range of essential elements – from fragments and chunks which break down data for easier processing, to frames and samples that define the quality of video and audio. Additionally, various protocols play a critical role in governing how this data is transmitted. In this guide, we’ll explore these key concepts that you must know to gain a better understanding of the complex world of video encoding and streaming.
- Fragments: In video encoding, fragments refer to smaller parts of a video file that are divided for easier processing and transmission. They are typically used in adaptive streaming protocols like HTTP Live Streaming (HLS) or Dynamic Adaptive Streaming over HTTP (DASH).
- Chunks: Chunks are similar to fragments. They are small pieces of data that are part of a larger file. The video file is divided into chunks to be processed and transmitted separately. Each chunk can be encoded at a different bit rate, which allows for adaptive streaming.
- Frames: A frame in video encoding is a single still image in a sequence of images that make up a video. The number of frames per second (fps) in a video determines its smoothness. The higher the fps, the smoother the video will appear.
- Samples: In the context of video encoding, a sample typically refers to a single output of a video or audio signal at a given point in time. The sample rate for audio is the number of samples that are taken per second.
- Bitrate: Bitrate is the amount of data processed per unit of time in a video file, typically measured in kilobits per second (Kbps). Higher bitrate means higher video quality and larger file size.
- Codec: A codec is a software used to compress or decompress a digital media file, such as a song or video. Examples of codecs include H.264, H.265 (HEVC), and VP9.
- Container: A container is a file format that contains various types of data — compressed video, audio, and metadata such as subtitles, chapter details, or even 3D rendering information. Examples of containers include MP4, AVI, FLV, and MOV.
- Compression: Compression in video encoding is the process of reducing the size of a video file while trying to maintain as much quality as possible. There are two types of compression: lossless (no quality loss) and lossy (some quality loss).
- Keyframes: In video encoding, keyframes are frames in which a complete image is stored in the data stream. In between keyframes, only the changes in the video are stored.
- GOP (Group of Pictures): GOP is a group of successive pictures within a coded video stream. It starts with a keyframe (I-frame), followed by P-frames and B-frames. The length of the GOP impacts the quality and size of the video.
- Protocols: In the context of video encoding, protocols are sets of rules for how data is transmitted over a network. Different protocols are used for different types of media and network conditions. Some commonly used video streaming protocols include:
- HTTP Live Streaming (HLS): A protocol developed by Apple for streaming live and on-demand video content to devices over the internet. HLS breaks the video stream into a sequence of small HTTP-based file downloads, each download loading one short chunk of an overall potentially unbounded transport stream.
- Dynamic Adaptive Streaming over HTTP (DASH): Also known as MPEG-DASH, this is an adaptive bitrate streaming technique that enables high quality streaming of media content over the internet delivered from conventional HTTP web servers.
- Real-Time Messaging Protocol (RTMP): Developed by Adobe Systems, RTMP is a protocol for streaming audio, video, and data over the internet, between a Flash player and a server.
- Real-Time Streaming Protocol (RTSP): This network protocol is designed for use in entertainment and communications systems to control streaming media servers. It’s used in conjunction with Real-time Transport Protocol (RTP) and with Real-time Control Protocol (RTCP) for media stream delivery.
- Secure Reliable Transport (SRT): This is an open-source protocol, developed by Haivision, which allows for the delivery of high-quality and secure, low-latency video across the public internet.
- WebRTC: This is a free, open-source project that provides web browsers and mobile applications with real-time communication via simple application programming interfaces. It’s used for live streaming and low-latency interactive video communication.
- NAL Unit: NAL stands for Network Abstraction Layer. In the context of video encoding, a NAL unit is a packet of data that contains part of a frame, a whole frame, or multiple frames of video data. The NAL structure provides the information needed to decode the video data. NAL units are used in several video coding standards, including H.264 (MPEG-4 AVC) and H.265 (HEVC). There are several types of NAL units, including:
- SPS (Sequence Parameter Set): Contains information about the sequence of frames, such as frame rate, resolution, and aspect ratio.
- PPS (Picture Parameter Set): Contains information necessary for decoding individual pictures within a sequence, including details about the use of deblocking filters, entropy coding mode, and more.
- IDR (Instantaneous Decoder Refresh) Frames: These are complete frames (like keyframes) that can be decoded independently of any other frames. They are used to mark points where the decoder can start decoding the video data after seeking, or if a packet is lost.
- Non-IDR Frames: These are frames that require information from other frames to be decoded. They refer to data in other frames (either past or future) and help reduce the amount of data that needs to be transmitted.
- SEI (Supplemental Enhancement Information): Contains additional information that can enhance the use of the video data but is not required to decode the video images. Examples include messages for 3D video, color characteristics, display orientation, and more.
- AUD (Access Unit Delimiter): An optional NAL unit that can be used to indicate the boundary of access units (primary coded picture and its associated coded data).
- End of Sequence: Used to indicate the end of the coded video sequence.
- Filler Data: Contains no information and can be removed by the network without affecting the video decoding process. It’s used to maintain a constant bit rate in the encoded video.