|
ecodis :: Efficient Video Codecs |
Video codecs which I worked on:
MPEG-I VVC / ITU-T H.266
Versatile broadcast/streaming codec
|
|
In signal processing, data compression, source coding, or bit-rate reduction involves encoding information using fewer bits than the original representation. Compression can be either lossy or lossless. [...] The process of reducing the size of a data file is often referred to as data compression. In the context of data transmission, it is called source coding (encoding done at the source of the data before it is stored or transmitted) in opposition to channel coding.
Wikipedia page on data compression, 2017
Storing or transmitting contemporary ulta-high-definition (UHD) digital video content in uncompressed form is virtually impossible due to the extremely high data rates; only one second of HDR video with 3840×2160 pixels at 50 frames per second would fit onto a CD. Therefore, efficient lossy coding with very good visual quality even at very low data rates is even more important than in audio applications. This implies that a maximum of redundant and irrelevant information must be removed during the coding.
On this page, the four most efficient newest-generation video coding standards are introduced. The first one, whose specification I am involved in, has just been finalized.
MPEG-I Versatile Video Coding (VVC), also Standardized as ITU-T H.266 2016–2020
VVC, also known as H.266, is a flexible general-purpose video coding specification recently standardized by ISO/IEC and ITU-T [ source]. Developed by the Joint Video Experts Team (JVET), the final VVC version is intended to exceed all existing standards (most notably, the three mentioned below) in compression performance at the same subjective reconstruction quality, with only a moderate increase in decoding workload.
Since the VVC standard is still new (it was completed in July 2020), I cannot present any comparisons between VVC and other state-of-the-art video codecs. However, I can report from the April 2018 JVET meeting in San Diego that some proposals submitted for standardization in February 2018 achieved notable compression efficiency gains near 40 % over VVC's predecessor (see below) at acceptable decoder complexities of 3–4 times that of the state of the art [ source]. Therefore, at that time I expected the finalized version of the standard to provide bit-rate savings of at least one third at a decoding workload even closer to that of current video coding standards (factor of 2).
Update June 2019 A formally performed subjective test indicates that, for HD and UHD standard dynamic range content, even in its by then unfinished state, VVC already offered the same visual coding quality as its predecessor, HEVC/H.265, with 36–40 % lower bit-rate. The fundamental compression principle applied in Versatile Video Coding, block-based transform coding of spatiotemporal prediction residuals (also called hybrid block transform coding), is identical to that used in HEVC but improved in many details:
partitioning now allows for rectangular in addition to HEVC's square-shape blocks,
prediction includes affine, triangular, intra-matrix, and luma-to-chroma predictors,
residual transformation was extended to more types and a two-stage technique,
quantization is now trellis-based and configurable over a wider bit-rate range,
entropy coding was improved for lossy, lossless, and partial-lossless applications,
in-loop filtering for deblocking was improved, adaptive Wiener filtering was added.
Moreover, decoder-side motion compensation refinement tools were included to further increase the coding efficiency especially for videos containing strongly moving objects.
During the first stages of the VVC development, I proposed an encoder optimization algorithm improving the subjective video coding quality for some content and presented my work in Macau, Ljubljana, and Marrakech. At the Ljubljana meeting, I also suggested adding support for a new 10- and 12-bit packed YUV/RGB image and video storage format to the VVC code base [ report], a description of which is given below. Later during the standardization, I contributed to better in-loop filtering and the joint transform coding of the chroma components in color images and videos [ reports].
VTM, the VVC reference encoding/decoding software, is publicly accessible here, and the draft specification text (currently version 10) is freely available via this link. Since May 2019, VVC achieves objective efficiency gains (in terms of Bjøntegaard delta rate over its predecessor, the HM reference software) of about 24 % for still pictures (1.8x decoder runtime of HM) and more than 34 % for random-access videos (1.7x decoder runtime of HM) [ table]. The stable version, due July 2020, will improve upon this only by another 2–3 percent [ source] but may encode and decode a bit faster. (Update June 2020 According to my last contribution to the VVC standardization, a crosscheck report, the final version of the VVC standard provides around 39 % rate reduction over HEVC in the random-access use case, thanks to a last-minute change.) I'll update this section once the results of the VVC verification test have become available.
Update Oct. 2020 The results of the SDR UHD verification test (3840×2160 pixels and 30 or 60 fps) have now been published. Two VVC encoders, VTM and VVenC, were evaluated against HM on 5 representative 10-second video sequences and at 5 similarly spaced quality points. The pooled quality scores (averaged per encoder and quality/rate point) are illustrated below. They confirm the VVC-vs-HEVC bit-rate savings observed in the 2019 visual test (see above) and show that excellent quality (MOS > 8) is reached
above 8.1 Mbit/s with HM or a bit less than that with a visually optimized encoder,
above 5.0 Mbit/s with VTM for average bit-rate savings of roughly 40 % over HM,
above 3.8 Mbit/s with VVenC for 53 % rate savings over HM and 24 % over VTM
on typical high-resolution video content. Especially with regard to VVenC, this is a great achievement since this encoder is also the fastest of the three. Besides contributing to VTM, I helped develop VVenC in 2020 and implemented perceptual optimizations based on the XPSNR model. The 43 % rate savings of VTM over HM (when averaging across the whole quality scale) underline the successful completion of the VVC standard. A press release mentioning this part of the verification test is archived here.
Figure 1. Results of the VVC verification test for SDR UHD content, JVET, 2020. Blue/lower-right: HEVC (HM enc. 16.22), orange/inbetween: VVC (VTM enc. 10.0), gray/ upper-left: VVenC 0.1. MOS scale: 0-2 bad, 2-4 poor, 4-6 fair, 6-8 good, 8-10 excellent quality. The thin gray line shows the expected quality for later versions of VVenC. (Values pooled from the per-sequence MOS data as they are illustrated in doc. JVET-T0097, which are courtesy of M. Wien)
The latest stable, tested VVenC release, version 1.12.0 from June 2024 which, like VTM, is publicly available under a clear open license, can be accessed via the following links. As of version 0.2.1, VVenC features a decently performing single-pass or two-pass rate control mode. (Update In version 1.8, the single-pass rate control mode was improved further, and in version 1.11, a tentative rate capped constant quality factor (CQF) mode was added.) The VVC specification text has been published not only as ITU-T standard H.266 (freely downloadable) but also as ISO/IEC international standard 23090-3.
An optimized, fully standard compliant open-source decoding counterpart of VVenC, called VVdeC, is also available. It's also worth noting that commercial availability of another optimized VVC software decoder implementation, offering real-time decoding of VVC streams with up to 8K resolution, was announced by Sharp in December 2020.
MPEG-5 Essential Video Coding (EVC), an Alternative 4K Video Codec 2018–2020
Around the April 2018 JVET meeting in San Diego, where the first draft of the VVC referencec software and specification text was agreed upon (see above), the Motion Picture Experts Group (MPEG) decided to initiate work on a separate, more constrained (in terms of development duration and included technology) video coding solution, to be completed and standardized in mid 2020 under the name MPEG-5 Essential Video Coding (EVC).
More details and the motivation behind this approach are given here.
In November 2018, two solutions were submitted to MPEG in response to its Call for Proposals (CfP) on new video coding technology with «simplified coding structure and an accelerated development time of 12 months» [ source]. Two months later, at its January 2019 meeting, MPEG evaluated both proposals [ report] and selected the one by Samsung, Huawei, and Qualcomm (a description of which is provided here) as the starting point of the EVC standardization. The relevant EVC documents are provided on the following web pages. Note that an MPEG user name and password are required to access these pages, which implies that this standardization is essentially closed-source.
In its first revision, MPEG-5 EVC roughly matches the joint MPEG-H/ITU-T HEVC in both objective (PSNR) and subjective (visual quality) performance when operated in its so-called Baseline profile setup, at least according to the CfP evaluation. Its Main profile configuration, however, was verified to already deliver 24 % better coding efficiency than HEVC, which may increase by a few percent until the end of EVC's development.
(Update Oct. 2019 ETM 3.0 provides about 26 % rate reduction over HEVC [ source], which should be very close to the final performance that this coding standard will offer.) Note that this value remains roughly 13 % short of the latest results for MPEG-I VVC. The next few years will show which of these two codecs will achieve a wider market adoption. I will update this section once the EVC codec software has been published. By then, the specification text will have become available as ISO/IEC standard 23094-1.
MPEG-H High Efficiency Video Coding, also Standardized as ITU-T H.265 2010–2015
Employing the finalized High Efficiency Video Coding (HEVC) specification in ISO/IEC 23008-2 and ITU-T H.265 currently is the most efficient way to compress moving pictures, especially high-resolution HD and UHD video. Developed mainly between 2010 and 2013, with some screen content and 3D coding extensions added after 2013, HEVC achieves an increase of about 50 % in perceptual compression efficiency over previous coding standards like H.264 [ source]. In other words, averaged across several coded video sequences, HEVC provides roughly the same subjective video quality as the older coding formats, and it does so using encoded bit-streams which are only half as large. This performance boost is what allowed, for the first time, the delivery of high-quality UHD video to consumers via broadcasting, streaming, and UHD Blu-Ray disc.
As of 2018, hardware-based HEVC decoding is supported by most TVs, set-top boxes, video players, computers, tablets, and even smartphones. The best freely and publicly usable HEVC encoder is maintained by the x265 project team and is located here:
Of course, the HEVC reference encoding/decoding software is also publicly available. HEVC, as described above, is the predecessor of the VVC standard, and most of its underlying technology can still be found in the current draft of the VVC specification. In fact, all visual codecs discussed on this page use the exact same algorithmic building blocks which define a modern hybrid block transform video codec like HEVC. These are
a partitioner segmenting each component of the input into nonoverlapping blocks,
a prediction stage attempting intra- or inter-picture prediction of each input block,
a residual transform converting the prediction error into a spectral representation,
a quantizer mapping the residual transform values to a smaller set of coefficients,
an entropy coder applying lossless compression to the quantized coefficients, and
a few postfilters reducing blocking, denoising and ringing artifacts upon decoding.
Some codecs also add encoder-side prefilters to complement the decoder's postfilters. Note that modern audio codecs employ the same elements and that, in both audio and video coding, a second prediction stage may be used before or after the quantizer.
AOMedia AV1: A General-Purpose Codec for Internet Video Streaming 2015–2020
The AV1 video codec, jointly developed by the Alliance for Open Media ( AOMedia) between 2015 and 2018, is a general-purpose format for Internet streaming based on well known technology. The video compression capability of AV1 is realized primarily with coding techniques derived from VP9/VP10, Daala, and Thor [ source]. Inside a WebM container, audio compression support is added through the OPUS codec [ source].
The IETF is expected to adopt AV1 as the Internet Video Coding (NetVC) standard in late 2018 [ source] alongside the OPUS codec, which has already been standardized in RfC 6716 in 2012. I anticipate broad hardware decoding support for AV1 in late 2020. Note that software decoding is already provided, even on Windows [ source], and work on a BSD-licensed optimized decoder called dav1d has progressed as well. The current versions of the AV1 specification and software are available at these pages:
Surprisingly, the subjective performance of the AV1 codec in comparison to its latest competitors, VVC and HEVC, is relatively inconsistent. In some independent tests, AV1 matched the coding efficiency of HEVC [ source], while in others, the codec was objectively and subjectively inferior to HEVC [ source]. This can be attributed to the use of different encoder speed presets in the evaluations: for HEVC-like performance, a very slow AV1 encoder preset must be used [ source]. Moreover, the default encoder configuration for random-access scenarios is a bit different from that of other codecs, making direct codec comparisons difficult. However, since there is clear evidence that the precursor to VVC, known as Joint Exploration Model ( JEM), outperforms both AV1 and HEVC and also encodes faster than AV1 [ source 1, source 2], it is obvious that VVC will, indeed, become the most efficient video codec during the next few years. Some online documentation of AV1's most interesting and innovative coding tools can be found here and here. An overview of all coding tools is given in this paper.
Update Nov. 2018 As this image indicates, the speed of the AV1 reference encoder has recently been increased by at least a factor of 60 without a significant reduction in coding efficiency [1.6 %, source], so it seems that a runtime and coding gain similar to that of the HEVC reference encoder will, indeed, become possible with AV1 soon. This observation is also supported by a May 2019 test by the BBC, summarized here. Still, the efficiency of the VVC reference software clearly remains out of reach for AV1. This shortcoming will be addressed by AV2, whose standardization will begin in a few years when support for AV1 has been widely deployed to consumer devices [ source].
Summary: More Coding Gain Possible but Hard to Implement Efficiently Nov. 2018
The current VVC standardization (see above) indicates that further gains in image and video coding performance are still possible. However, given the order of magnitude increases in encoding runtime of both VVC and AV1, I feel that we are rapidly leaving the path of reasonable gain-efficiency tradeoff followed for so long: with next-generation standards, coding of a single 4K image on one processor takes up to half an hour, and the hardware requirements especially for moving-picture coding are substantial. For this reason, most of the often promising but experimental coding tools in JEM (like FRUC, for example) won't make it into VVC: their algorithmic complexity and/or fast memory demands are just too high for usable implementations in both hardware and software. It's true that encoding can now be performed highly parallelized in the cloud, but spending a year of aggregate computing power on one hour of UHD video is clearly not efficient and, so far, not environmentally friendly [ article]. Don't forget the countless coding-decoding iterations performed by the various participating companies during experiments towards a codec's standardization itself (more and more of which need to be run due to the vanishing potential for further coding gain)! And remember that the final HEVC/H.265 reference encoder was only three times slower than its AVC/H.264 ancestor [ source]!
Therefore, aside from working on speeding up the new-generation video encoders by at least an order of magnitude, I believe it's time to reconsider the current approach in video coding development. Some experts at, e.g., Netflix share this vision and call for innovation «beyond block-based hybrid video coding» as outlined above. If that means using more extreme measures like large 4D-DCTs or CNNs, I disagree since, in my view, the computational burden for a competitive level of performance will likely be even more problematic than in today's codecs. If, however, the idea is to refrain from squeezing 9% more coding gain (i.e., statistical redundancy) out of existing block-based schemes, and instead to further exploit the inaccuracies of human vision in the design of image and video codecs (using parametric tools as in audio, which we still hardly do), then I fully support that approach. In fact, with my current work I already do. I hope you do too.
HEIF/MIAF and AVIF: The Two Latest-Generation Still-Image Codecs 2015–2019
Still-image coding, like video coding, has come a very long way since the early days of T.81/JPEG and H.262/MPEG-2 about a quarter of a century ago. Recently, two additions to the long list of image coding specifications have emerged, namely, single-picture constrained variants of HEVC, called High-Efficiency Image Format ( HEIF), and of AV1, named AV1 Still Image File Format ( AVIF). An extension of the HEIF standard known as Multi-Image Application Format ( MIAF) is currently being finalized as well, and as if that weren't enough, the JPEG committee is also working on a novel still-picture coding standard, referred to as JPEG XL (the L means longterm), to be finalized in late 2019 [ source].
(Update Oct. 2019 This milestone has been moved to April 2020 [ source])
All these contenders have in common that they provide efficient support for high-resolution, HDR, and wide color gamut (WCG) as well as (partially) transparent image content. JPEG XL also provides means for lossless transcoding to and from legacy JPEG, PNG, and GIF compressed images, which is a very useful feature in my opinion. I will update this page with further details on each coding specification and comparative demos once evaluation software for all of these coding standards has become publicly available. For now, I can recommend this interactive, recently updated picture coding demo by Thomas Daede. See also here and here.
Figure 2. Basic evolution of still image coding in the age of the Internet. Top left: JPEG (1992), top right: JPEG 2000 (2000), bottom left: HEIF (2015) and bottom right: original picture (almost matched in visual quality by modern image codecs like AVIF, JPEG XL or VVC). 70:1 compression (768:11 kilobyte) was chosen for all illustrated coders. Notice how block- ing and blurriness vanishes from top left to lower right.
(Fig. Lena image)
Further, Lesser Known Video Codecs and Links to Additional Resources 2019–2020
AVS3, China's latest-generation Audio Video System standard. Tech- nically, its almost finalized phase 2 version of the video coding part seems to be quite similar to VVC, with compression performance (efficiency, runtime) comparable to that of EVC. See this page.
A PDF with presentations about (in that order) AVS3, AV1/AV2, EVC and VVC, given during the panel discussion at the 2019 Picture Coding Symposium (PCS) in Ningbo. Has some interesting technical and statistical details about these codecs. An even more detailed presentation of VVC was held at the ICIP in late 2020.
page last modified in July 2024, updated VVenC & JVET links
|
|