Visually Optimized Two-Pass Rate Control for Video Coding Using the Low-Complexity XPSNR Model

Christian R. Helmrich, Senior Member, IEEE, Ivan Zupancic, Jens Brandenburg, Valeri George, Adam Wieckowski, and Benjamin Bross, Member, IEEE

The above paper has been accepted for presentation at the IEEE Int. Conf. on Visual Communications and Image Processing (VCIP), Munich, in December 2021. This page provides supplementary information on the publication and a visual demo.

Note that there is also a preceding introductory paper on XPSNR, presented at the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) in May 2020, on which this publication is based. That paper is made available on this page.

An open-source implementation of the XPSNR algorithm is freely available on GitHub: XPSNR filter plug-in for FFmpeg

Contents:

Corrections to the IEEE VCIP published version of the paper as of September 2021
Visual demonstrations of the effects of XPSNR based optimization during encoding

Corrections to the IEEE VCIP published version of the paper as of September 2021

Minor details were corrected and/or clarified. The submitted version of the paper is available here: Paper manuscript
In the final paper revision, a reference to this supplementary page, containing the additional information requested by the reviewers, was added.

The following information is provided as answers to some of the reviewers' questions:
- Sec. II: The parameters and constants in equations (3) — (6) were determined experimentally, considering the accuracy-runtime tradeoff.
- Tab. III: The “PSNR” and “XPSNR” values are mean BD-rate results when measuring the distortion in PSNR and XPSNR, respectively.
- Tab. III: The UHD averages include the results for sequence Campfire. That sequence was only excluded from the values in Tabs. I & II.
- Tab. III: The blue coloring highlights BD-rate averages which exceed 3 percent. Sequence-wise results are available in this CSV table.
Demonstrations of the visual effects of XPSNR based optimization during encoding

The following illustrations serve as a demonstration of the visual benefit of using the perceptually optimized quantization parameter adaptation (QPA) in a transform-based still-image codec like HEVC [1]. The basic coding algorithm used for this demonstration is draft 3 of the Versatile Video Coding (VVC) specification [2], as implemented by the VTM3.0 reference software [3] into which our QPA method has been integrated. Since only single images are utilized for this demonstration, the VVC codec was configured to apply only “still-image” Intra-picture prediction.

The presented images were transcoded, with visual transparency, to high-bit-rate JPEG in order to limit the download durations for the viewers. Differences between the coded pictures are mostly visible in low-contrast regions, so viewing in low background-lighting conditions is advised.

This demonstration serves as an accurate depiction of how rate control encodings with visual QPA (i.e., using XPSNR based R-D optimization) differ perceptually from rate control encodings without such visual optimization (i.e., using traditional PSNR based least-squares optimization).

BQTerrace, uncoded input (HD, 1920×1080, lossless size: 4989 KB)

BQTerrace, VTM 3.0.1 without QPA, base QP 32 (HD, 1920×1080, coded size: 96.7 KB)

BQTerrace, VTM 3.0.1 with QPA, base QP 29 (HD, 1920×1080, coded size: 98.5 KB)

BasketballDrive, frame 68, uncoded input (HD, 1920×1080, lossless size: 5123 KB)

BasketballDrive, frame 68, VTM 3.0.1 without QPA, base QP 30 (HD, 1920×1080, coded size: 51.5 KB)

BasketballDrive, frame 68, VTM 3.0.1 with QPA, base QP 30 (HD, 1920×1080, coded size: 49.2 KB)

Kodak Image 15, uncoded input (768×512, lossless size: 755 KB)

Kodak Image 15, VTM 3.0.1 without QPA, base QP 28 (768×512, coded size: 22.2 KB)

Kodak Image 15, VTM 3.0.1 with QPA, base QP 29 (768×512, coded size: 22.4 KB)

ParkScene, uncoded input (HD, 1920×1080, lossless size: 4911 KB)

ParkScene, VTM 3.0.1 without QPA, base QP 29 (HD, 1920×1080, coded size: 95.9 KB)

ParkScene, VTM 3.0.1 with QPA, base QP 30 (HD, 1920×1080, coded size: 92.9 KB)

Please note that the layout or content of this web page may change. If it does, the date in the following line will be updated.

Christian R. Helmrich, September 8, 2021

References

[1] ITU-T, Recommendation H.265 and ISO/IEC, Int. Standard 23008-2, “High efficiency video coding,” Geneva, Switzerland, Feb. 2018. Online:
http://www.itu.int/rec/T-REC-H.265

[2] B. Bross, J. Chen, S. Liu, “Versatile Video Coding (Draft 3),” MPEG/JVET output document m45225/L1001, Macao, China, Dec. 2018. Online:
http://phenix.it-sudparis.eu/jvet/doc_end_user/current_document.php?id=4834

[3] JVET, “VVCSoftware_VTM: VVC VTM reference software,” Dec. 2018. Online: https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/tags

Visually Optimized Two-Pass Rate Control for Video Coding Using the Low-Complexity XPSNR Model

Christian R. Helmrich, Senior Member, IEEE, Ivan Zupancic, Jens Branden­burg, Valeri George, Adam Wieckowski, and Benjamin Bross, Member, IEEE

Corrections to the IEEE VCIP published version of the paper as of September 2021

Demonstrations of the visual effects of XPSNR based optimization during encoding

References

Christian R. Helmrich, Senior Member, IEEE, Ivan Zupancic, Jens Brandenburg, Valeri George, Adam Wieckowski, and Benjamin Bross, Member, IEEE