In this article, we will explore the two leading video codecs, the well-established VP9 and its promising successor AV1, in terms of compression efficiency, video quality, hardware support, and adoption.
But first, why is video compression needed?
Demand for HD (High-Definition) and UHD (Ultra-High-Definition) video streaming, video conferencing, and digital television broadcasting has surged in recent years.
These HD/UHD videos generally have a considerable size. Storing or transmitting them over the Internet in uncompressed form is impractical. For example,
- 1 HD frame (1920x1080) takes ~8MB (assuming full colour), means 1 second of HD video at 30fps would take ~250MB.
- 1 minute of HD video would take 15GB.
- A 30-minute video call would take 450GB, or a 2-hour movie would take 1.8TB.
- Therein lies the need for video codecs. A video codec serves two purposes: it reduces the video's size using various compression techniques (video encoding), which is then sent along with metadata over the network, thus reducing bandwidth consumption.
- Additionally, it decodes and reconstructs the video for playback on the end-user device while preserving the visual fidelity to the best possible extent.
Therefore, the correct choice of video codec will make or break your application.
Important Keywords
First, let's start with the essential keywords used throughout the article.
- Quality, in video encoding, refers to how closely the compressed video matches the source material. Higher quality indicates less loss of detail and fidelity in the compressed video. It is usually assessed using objective metrics like Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), etc., or subjective techniques involving humans.
- Bitrate refers to the amount of data processed per unit of time in video encoding, typically measured in kilobits per second (kbps) or megabits per second (mbps). It directly affects the file size and quality of the video: higher bitrates generally result in better quality but larger file sizes, while lower bitrates lead to smaller file sizes but potentially lower quality.
- Compression efficiency refers to how effectively data is compressed without significantly losing quality. Higher compression efficiency means more data can be represented in fewer bits, resulting in smaller file sizes without compromising quality.
- Encoding speed refers to how quickly a codec can encode video data. Faster encoding speeds are desirable for efficient processing, especially in real-time applications or when encoding large volumes of video content.
All these are interlinked, and adjusting one parameter can affect the others. For example:
- Increasing bitrate generally improves quality, but it also increases file size.
- Improving compression efficiency allows for smaller file sizes without sacrificing quality.
- As the compression efficiency increases generally using complex algorithms, encoding speed decreases.
The success of a video codec is striking a perfect balance between these factors.
Basic encoding mechanism
Here is a very brief overview of the encoding mechanism generally used by modern, open-source codecs:
A video is a sequence of still images or frames. The encoding process starts with every frame being split into multiple smaller, more manageable parts (called blocks) of different sizes in a process called frame partitioning.
Then, the encoder tries to predict the pixel data for each block either based on neighbouring pixels (Intra-frame prediction) or preceding frames (Inter-frame prediction) using various techniques. Intra-frame prediction is useful in reducing spatial redundancy within a frame, whereas inter-frame prediction is used in estimating motion and, thereby, is useful in reducing temporal redundancy among frames.
The difference between predictions and original samples (residuals) is then transformed using advanced mathematical techniques and quantised. Quantisation reduces the number of bits needed to represent the transformed data, thereby sacrificing some detail. This data is further compressed by employing the entropy coding technique.
Loop filters and other post-processing steps are applied to smooth out artifacts introduced during compression, especially at block boundaries, and improve visual quality. The encoded video data, along with metadata such as frame rate, resolution, and audio information, are multiplexed into a container format like MP4 or WebM, etc., for transmission or storage purposes.
Basics of VP8 and VP9 codec
VP9
VP9 is a royalty-free, state-of-the-art video codec released by Google on June 17, 2013. It was designed to reduce bit rates by 50% compared to its predecessor, VP8 while maintaining video quality and competing with other codecs such as H.265/HEVC.
Since its release, VP9 has gained adoption in various applications, including online video streaming platforms like YouTube and Netflix and WebRTC for real-time communication applications. libvpx
is the reference implementation of VP9.
ffmpeg is a widely recognised cross-platform command line tool to process media files supporting both AV1 and VP9.
ffmpeg
command to encode video in VP9 codec:
ffmpeg -i input.mp4 -c:v libvpx-vp9 output.webm
AV1
The Alliance for Open Media was established in 2015 by several major tech companies, such as Amazon, Google, Meta, etc. The purpose was to create a next-generation, royalty-free codec due to HEVC's uncertainty in licensing and increasing cost, which gave birth to AOMedia Video 1 (AV1), initially launched in 2018.
libaom
is the reference implementation developed by AOMedia. Other implementations, such as SVT-AV1 and rav1e, are also available.ffmpeg
command to encode video in AV1 codec:
ffmpeg -i input.mp4 -c:v libaom-av1 output.webm
AV1 vs VP9 - Practical differences for web developers
Compression Efficiency
AV1 provides better compression efficiency than VP9, leading to reduced file size and lower bitrates without any discernible difference in quality. It offers up to 30% better compression than VP9, making it more efficient for streaming high-quality video. YouTube has been using AV1 for 8K videos in recent years.
Both libvpx
and libaom
, the reference implementations of VP9 and AV1, respectively, have various parameters to tweak bitrate, CPU utilization while encoding, quality, etc.
Encoding speed and compute requirements
Due to the improved but complex mechanisms in every step of the encoding/decoding process targeted for maximum compression and quality, the complexity of AV1 compared to VP9 has increased, making it resource-intensive. This causes higher CPU usage and a significant increase in encoding time on devices without hardware support for it.
The increased encoding time of AV1 compared to VP9, which is relatively more balanced in terms of bitrate, quality, and encoding time, makes it less suitable for live video streaming for the time being.
Quality
AV1 provides better relative quality at lower bitrates than VP9 owing to its better frame partitioning, advanced predictions, better filters etc., which introduces fewer unwanted artifacts.
HDR support
While both, AV1 and VP9 support High Dynamic Range(HDR) and Wide Color Gamut (WCG) for more vibrant and realistic colours, AV1 provides more reliable support than VP9s by integrating HDR metadata into the video bitstream in addition to it being included in the container (a wrapper that holds data streams such as video, audio, subtitles, and metadata together in a single file, e.g., WebM, MP4).
Support and Adoption
Despite having better compression efficiency and playback quality, AV1's slow encoding speed and lack of compatibility with some hardware and software still need to be improved for its widespread adoption.
On the other hand, thanks to Google's efforts, VP9 decoding is very well established and compatible with more than 2 billion devices, including browsers like Chrome, Opera, Edge, Firefox, and platforms such as Android, as well as millions of smart TVs. Android has supported VP9 since version 4.4 KitKat and iOS/iPadOS added VP9 support in iOS/iPadOS 14.
Support in WebRTC
The WebRTC API enables the creation of websites and applications that facilitate real-time communication among users, allowing for the exchange of audio and video, along with optional data and additional information.
VP9 support in WebRTC is available, starting with Chrome (version 48 onwards) and Firefox. In contrast, AV1 support in WebRTC is still in its early stages, with experimental encoding available in Chrome (version 90 Beta onwards) at the time of writing this article.
Implementation differences between AV1 and VP9
AV1 builds on VP9's capabilities, offering additional features and improved efficiency. Here are some key differences in implementation:
Better Partitioning
As mentioned above, the first step in the encoding process is partitioning the frame into smaller, more manageable parts, a process known as block decomposition.
While VP9 started with a block size of 64x64, called super-block (SB), and allowed recursive splitting in 4 options (the number of unique ways to split), AV1 has increased the size of SB to 128X128 and the total options to 10, thereby increasing the flexibility, causing better prediction estimation for following steps during the encoding process and ultimately better compression and quality.
Improved Prediction modes
Better prediction leads to reduced residual (difference between prediction and original), which is what is ultimately encoded and compressed.
AV1 saw a significant upgrade in both Intra-frame and Inter-frame prediction techniques.
In Intra-frame prediction, there is a total of 69 prediction modes compared to the 10 modes in VP9.
The AV1 inter-frame prediction extends the number of reference frames (the frames used for comparison) to up to 7, while VP9 uses 3. It has also introduced various other advanced techniques for better motion estimation in addition to the relatively simple ones used in VP9.
Diverse transforms and superior entropy encoding
AV1 provides more diverse transforms that can be combined in 16 ways against the 4 provided by VP9.
Entropy encoding is a way to compress data in a lossless fashion.
AV1 employs a symbol-to-symbol adaptive multi-symbol arithmetic coder, which is relatively more efficient, adaptable and built for parallelism, while VP9 employs a tree-based Boolean non-adaptive binary arithmetic encoder.
Film grain
Film grain is widely present in different digital media, such as old movies, TV shows, and videos recorded in low light.
It is random (equivalent to noise) and, therefore, hard to predict, making it generally challenging to preserve and compress by codecs. However, it might be necessary for stylistic choice or creative intent.
AV1 provides support for artificially generating this noise. Before feeding input to the encoder, film grain parameters can be determined and passed along with de-noised input, which would later be used at the decode time to produce synthetic film grain.
More flexible tiles
Tiling is a feature provided by a video codec that breaks up a video frame into smaller, individually decodable sections known as tiles. This makes it possible to process encoding and decoding data in parallel, which results in better resource utilisation. This improves error resilience and network condition adaptation. Both VP9 and AV1 support tiling.
AV1 offers flexibility in terms of tile size compared to the uniform tile size in VP9 within a frame, allowing for smaller tile sizes in regions with higher complexity. This helps to balance the workload among threads and minimise frame coding latency.
How to choose between VP9 or AV1
Both codecs are open-source, they offer transparency, customisation options, and benefit from a strong community that contributes to their improvement and support.
Selecting the correct video codec for your application involves balancing several factors. Here's a structured breakdown to help you make an informed decision:
Target Platform Compatibility:
- Check VP9 and AV1 support across relevant browsers, devices, and platforms for your application and target audience.
- Assess whether backward compatibility is needed. For example, if supporting older devices or browsers is crucial, determine if additional codec support (e.g., VP8, H.264) could be used.
Focus on your specific application requirements:
The optimal codec choice balances considerations of bandwidth, compression efficiency, quality, and encoding speed based on the specific needs of the application.
- For services where bandwidth may be limited, or Ultra HD videos need to be delivered without real-time processing like on-demand streaming (e.g., Netflix), AV1's high compression efficiency offers good quality at lower bitrates.
- For Real-Time Communication services, prioritise low latency. Codecs like VP8 or H.264 are are well suited for this task. Also all WebRTC complaint browser has to support VP8 and H.264 codec.
- Evaluate your infrastructure and resource allocation capabilities and whether your hardware and software environment can efficiently handle the encoding and decoding requirements of VP9 or AV1.
Encoding videos with ImageKit
Detecting device capability for AV1 and VP9, combined with correctly configuring the content delivery network (CDN) to cache multiple copies of the same content and serve the correct one, can become trickier. You can use ImageKit video API and CDN delivery to offload the heavy lifting and serve the video in the most optimal format.
ImageKit is a Media management and delivery platform for high-growth teams. It provides real-time video processing APIs to process, transform, and stream videos across devices directly from the video's URL without worrying about intricate architecture details around encoding and browser support. You can use the forever free plans and create your video application quickly.
Future Outlook
As hardware support for AV1 encoding and decoding expands in newer GPUs and CPUs, some initial challenges associated with AV1 such as increased complexity, high CPU usage, and longer encoding times due to software-dependent encoding/decoding will be addressed and its support will grow. For example, Apple's M3 processors include AV1 hardware-accelerated decoding capabilities.
Improved software-based AV1 decoders, such as VideoLAN's libdav1d
(known for its speed advantage over Google's libgav1
), are emerging, expanding AV1 Playback Support in devices with limited processing power.
Despite the rise of AV1, VP9 will likely remain a viable option for the foreseeable future, considering its compatibility with existing hardware and its role in ensuring smooth playback on a broader range of devices.