A Study of the Extended Perceptually Weighted Peak Signal-to-Noise Ratio (XPSNR) for Video Compression with Different Resolutions and Bit Depths

Christian R. Helmrich, Senior Member, IEEE, Sebastian Bosse, Heiko Schwarz, Detlev Marpe, Fellow, IEEE, and Thomas Wiegand, Fellow, IEEE

The above paper has been accepted for publication in the ITU Journal: ICT Discoveries – Special Issue: The Future of Video and Immersive Media in March 2020. This page provides supplementary information on the publication and a visual demo.

Note that there is also a preceding introductory paper, presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) in May 2020, to which this page is also applicable. That paper is available on this page.

An open-source implementation of the XPSNR algorithm is freely available on GitHub: XPSNR filter plug-in for FFmpeg

Contents:

Corrections to the ICT ITU Journal published version of the paper as of April 2020
Visual demonstrations of the WPSNR and XPSNR perceptual sensitivity weighting

Corrections to the ITU Journal: ICT Discoveries published version of the paper as of April 18, 2020

Some content was corrected and/or clarified. The submitted version of the paper is available here: Journal manuscript
The following is a corrected and expanded Table 1 with Pearson linear correlation coefficients, after third-order polynomial
mapping according to ITU-T Rec. P.1401. The CSIQ and VIF values were added recently and are not included in the paper.
The two-digit references in [brackets] are the same as those in the published version of the paper, the others are listed below.

Dataset Name

Subset Size

Distortions

Resolutions

bpp

fps

|
|

PSNR

SSIM

MS-SSIM

VIF [5]

VMAF

WPSNR

XPSNR

Yonsei [33]

10×12

H264, VP9,
HEVC [1]

3840×2160

0.833
1.377

0.910
0.908

0.863
1.128

0.922
0.964

0.945
0.823

0.928
0.926

0.942
0.825

LIVE Vid. [34]

10×15

H264, wl,
MPEG-2, ip

768×432

25,
50

0.543
9.182

0.659
7.860

0.707
7.445

0.530
9.292

0.749
7.097

0.638
8.472

0.702
7.836

IVP [35]

10×{10, 14}

dirac, H264,
MPEG-2, ip

1920×1088

0.638
0.810

0.597
0.813

0.557
0.864

0.233*
0.997

0.592
0.853

0.687
0.770

0.708
0.749

ECVQ [36]

8×{11, 12}

H264,
MPEG-4

352×288

0.735
11.60

0.895
7.534

0.887
7.621

0.883
7.991

0.831
9.554

0.875
8.047

0.864
8.417

EVVQ [36]

8×{11, 12}

H264,
MPEG-4

640×480

0.727
10.46

0.896
6.670

0.897
6.570

0.864
7.499

0.941
5.137

0.898
6.571

0.912
6.161

SJTU [37]

10×6

HEVC

3840×2160

0.754
0.328

0.788
0.312

0.813
0.304

0.771
0.325

0.828
0.293

0.791
0.316

0.882
0.232

JVET CfP [20]

10×8

HEVC,
HHI [39]

1920×1080,
3840×2160

30–
60

0.724
1.662

0.809
1.402

0.832
1.208

TBD*
TBD

0.863
1.225

0.699
1.722

0.866
1.213

JVET rep. [10]

14×10

HEVC,
VTM-5 [3]

1920×1080,
3840×2160

8,
10

30–
60

0.732
10.27

0.856
7.637

0.815
8.835

TBD*
TBD

0.862
7.693

0.765
9.767

0.827
8.494

JVET VT1 [6]

5×15

HEVC, VV-
enC, VTM

3840×2160

30,
60

0.830
1.319

0.808
1.369

0.928
0.881

TBD*
TBD

0.782
1.465

0.845
1.234

0.905
1.010

CSIQ Vid. [4]

12×9

H264, snow,
HEVC

832×480

24–
60

0.754
12.69

0.871
9.210

0.864
9.317

0.892
8.764

0.863
9.679

0.874
9.403

0.880
9.210

Mean

104

|
|

0.727

0.809

0.816

0.810

0.826

0.800

0.849

Legend:

bpp: bits per pixel, fps: frames per second, PLCC: Pearson linear, SROCC: Spearman rank-order correlation coefficient, italic: root mean squared error (RMSE),
*: values excluded from averaging the correlation values (the low correlation value for VIF on the IVP subset is still being investigated), TBD: to be determined.

Notes:

The SROCC data for the newly added CSIQ Video subset are: PSNR 0.7543, SSIM 0.8788, MS-SSIM 0.8711, VMAF 0.8633, WPSNR 0.8770, XPSNR 0.8775.
The PLCC value for XPSNR without in-place smoothing of the block weights w (see Sec. 6 in the paper) on the ECVQ dataset is 0.789 with an RMSE of 10.47.

Subjective demonstrations of the sensitivity weighting in the perceptually weighted PSNR measure

The following illustrations demonstrate the block or sample-wise visual sensitivity weighting in the WPSNR metric. These weights can be used to control a perceptually optimized quantizer in codecs like High-Efficiency Video Coding (HEVC) [1] or Versatile Video Coding (VVC) [2, 3]. A dark (blue) color indicates a high visual activity and low weight w < 1 while a bright (orange) color means a low activity and high weight w > 1.

Lena image, top: original, left: block-wise bWPSNR sensitivity weights, center: sample-wise sWPSNR sensitivity weights, right: color legend.

BQTerrace image, top: original, left: block-wise bWPSNR sensitivity weights, center: sample-wise sWPSNR sensitivity weights, right: legend.

Please note that the layout or content of this web page may change. If it does, the date in the following line will be updated.

Christian R. Helmrich, October 9, 2020

References

[1] ITU-T, Recommendation H.265 and ISO/IEC, Int. Standard 23008-2, “High efficiency video coding,” Geneva, Switzerland, Feb. 2018. Online:
http://www.itu.int/rec/T-REC-H.265

[2] B. Bross, J. Chen, S. Liu, “Versatile Video Coding (Draft 5),” MPEG/JVET output document N1001, Geneva, Switzerland, July 2019. Online:
http://phenix.it-sudparis.eu/jvet/doc_end_user/current_document.php?id=6640

[3] JVET, “VVCSoftware_VTM: VVC VTM reference software,” May 2019. Online: https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/tags

[4] P. V. Vu and D. M. Chandler, “ViS3: An Algorithm for Video Quality Assessment via Analysis of Spatial and Spatiotemporal Slices,” J. Electr.
Imaging, vol. 23, no. 1, Jan./Feb. 2014. DOI: 10.1117/1.JEI.23.1.013016. Online: http://vision.eng.shizuoka.ac.jp/mod/page/view.php?id=24

[5] H. R. Sheikh and A. C. Bovik, “Image Information and Visual Quality,” 2006. Online: https://live.ece.utexas.edu/research/Quality/VIF.htm via
https://github.com/andrewekhalel/sewar.

[6] C. R. Helmrich, B. Bross, J. Pfaff, H. Schwarz, D. Marpe, and T. Wiegand, “Information on and analysis of the VVC encoders in the SDR UHD
verification test,” JVET input document T0103, teleconference, Oct. 2020. Online: JVET-T0103 (including information from JVET-T0097)

A Study of the Extended Perceptually Weighted Peak Signal-to-Noise Ratio (XPSNR) for Video Compression with Different Resolutions and Bit Depths

Christian R. Helmrich, Senior Member, IEEE, Sebastian Bosse, Heiko Schwarz, Detlev Marpe, Fellow, IEEE, and Thomas Wiegand, Fellow, IEEE

Corrections to the ITU Journal: ICT Discoveries published version of the paper as of April 18, 2020

Subjective demonstrations of the sensitivity weighting in the perceptually weighted PSNR measure

References