Visually Optimized Two-Pass Rate Control for Video Coding Using the Low-Complexity XPSNR Model

Christian R. Helmrich, Senior Member, IEEE, Ivan Zupancic, Jens Branden­burg, Valeri George, Adam Wieckowski, and Benjamin Bross, Member, IEEE

The above paper has been accepted for presentation at the IEEE Int. Conf. on Visual Communications and Image Processing (VCIP), Munich, in December 2021.  This page provides supplementary information on the publication and a visual demo.

Note that there is also a preceding introductory paper on XPSNR, presented at the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) in May 2020, on which this publication is based.  That paper is made available on link to sub-page this page.

An open-source implementation of the XPSNR algorithm is freely available on GitHub:  link to external web page XPSNR filter plug-in for FFmpeg

Contents:

  1. Corrections to the IEEE VCIP published version of the paper as of September 2021

  2. Visual demonstrations of the effects of XPSNR based optimization during encoding

  1. Corrections to the IEEE VCIP published version of the paper as of September 2021

    Minor details were corrected and/or clarified.  The submitted version of the paper is available here:  link to document Paper manuscript
    In the final paper revision, a reference to this supplementary page, containing the additional information requested by the reviewers, was added.

    The following information is provided as answers to some of the reviewers' questions:

    • Sec. II: The parameters and constants in equations (3) — (6) were determined experimentally, considering the accuracy-runtime tradeoff.

    • Tab. III: The “PSNR” and “XPSNR” values are mean BD-rate results when measuring the distortion in PSNR and XPSNR, respectively.

    • Tab. III: The UHD averages include the results for sequence Campfire.  That sequence was only excluded from the values in Tabs. I & II.

    • Tab. III: The blue coloring highlights BD-rate averages which exceed 3 percent.  Sequence-wise results are available in link to document this CSV table.

  2. Demonstrations of the visual effects of XPSNR based optimization during encoding

    The following illustrations serve as a demonstration of the visual benefit of using the perceptually optimized quantization parameter adaptation (QPA) in a transform-based still-image codec like HEVC [1].  The basic coding algorithm used for this demonstration is draft 3 of the Versatile Video Coding (VVC) specification [2], as implemented by the VTM3.0 reference software [3] into which our QPA method has been integrated. Since only single images are utilized for this demonstration, the VVC codec was configured to apply only “still-image” Intra-picture prediction.

    The presented images were transcoded, with visual transparency, to high-bit-rate JPEG in order to limit the download durations for the viewers. Differences between the coded pictures are mostly visible in low-contrast regions, so viewing in low background-lighting conditions is advised.

    This demonstration serves as an accurate depiction of how rate control encodings with visual QPA (i.e., using XPSNR based R-D optimization) differ perceptually from rate control encodings without such visual optimization (i.e., using traditional PSNR based least-squares optimization).

    BQTerrace_original - Click to view full-sized image.
    BQTerrace, uncoded input (HD, 1920×1080, lossless size: 4989 KB)

    BQTerrace_qp32_QPAoff - Click to view full-sized image.
    BQTerrace, VTM 3.0.1 without QPA, base QP 32 (HD, 1920×1080, coded size: 96.7 KB)

    BQTerrace_qp29_QPAon - Click to view full-sized image.
    BQTerrace, VTM 3.0.1 with QPA, base QP 29 (HD, 1920×1080, coded size: 98.5 KB)

    BasketballDrive68_original - Click to view full-sized image.
    BasketballDrive, frame 68, uncoded input (HD, 1920×1080, lossless size: 5123 KB)

    BasketballDrive68_qp30_QPAoff - Click to view full-sized image.
    BasketballDrive, frame 68, VTM 3.0.1 without QPA, base QP 30 (HD, 1920×1080, coded size: 51.5 KB)

    BasketballDrive68_qp30_QPAon - Click to view full-sized image.
    BasketballDrive, frame 68, VTM 3.0.1 with QPA, base QP 30 (HD, 1920×1080, coded size: 49.2 KB)

    Kodim15_original - Click to view full-sized image.
    Kodak Image 15, uncoded input (768×512, lossless size: 755 KB)

    Kodim15_qp28_QPAoff - Click to view full-sized image.
    Kodak Image 15, VTM 3.0.1 without QPA, base QP 28 (768×512, coded size: 22.2 KB)

    Kodim15_qp29_QPAon - Click to view full-sized image.
    Kodak Image 15, VTM 3.0.1 with QPA, base QP 29 (768×512, coded size: 22.4 KB)

    ParkScene_original - Click to view full-sized image.
    ParkScene, uncoded input (HD, 1920×1080, lossless size: 4911 KB)

    ParkScene_qp29_QPAoff - Click to view full-sized image.
    ParkScene, VTM 3.0.1 without QPA, base QP 29 (HD, 1920×1080, coded size: 95.9 KB)

    ParkScene_qp30_QPAon - Click to view full-sized image.
    ParkScene, VTM 3.0.1 with QPA, base QP 30 (HD, 1920×1080, coded size: 92.9 KB)

Please note that the layout or content of this web page may change.  If it does, the date in the following line will be updated.

Christian R. Helmrich, September 8, 2021



References

[1]  ITU-T, Recommendation H.265 and ISO/IEC, Int. Standard 23008-2, “High efficiency video coding,” Geneva, Switzerland, Feb. 2018. Online:
       link to external web page http://www.itu.int/rec/T-REC-H.265

[2]  B. Bross, J. Chen, S. Liu, “Versatile Video Coding (Draft 3),” MPEG/JVET output document m45225/L1001, Macao, China, Dec. 2018. Online:
       link to external web page http://phenix.it-sudparis.eu/jvet/doc_end_user/current_document.php?id=4834

[3]  JVET, “VVCSoftware_VTM: VVC VTM reference software,” Dec. 2018. Online: link to external web page https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/tags