A Study of the Extended Perceptually Weighted Peak Signal-to-Noise Ratio (XPSNR) for Video Compression with Different Resolutions and Bit-Depths

Christian R. Helmrich, Senior Member, IEEE, Sebastian Bosse, Heiko Schwarz, Detlev Marpe, Fellow, IEEE, and Thomas Wiegand, Fellow, IEEE

The above paper has been accepted for publication in the ITU Journal: ICT Discoveries – Special Issue: The Future of Video and Immersive Media in March 2020.  This page provides supplementary information on the publication and a visual demo.

Note that there is also a preceding introductory paper, presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) in May 2020, to which this page is also applicable.  That paper is available on link to external web page this page.

Contents:

  1. Corrections to the ICT ITU Journal published version of the paper as of April 2020

  2. Visual demonstrations of the WPSNR and XPSNR perceptual sensitivity weighting

  1. Corrections to the ITU Journal: ICT Discoveries published version of the paper as of April 18, 2020

    Some content was corrected and/or clarified.  The submitted version of the paper is available here:  link to document Journal manuscript
    The following is a corrected and expanded Table 1 with Pearson linear correlation coefficients, after third-order polynomial
    mapping according to ITU-T Rec. P.1401.  The CSIQ and VIF values were added recently and are not included in the paper.
    The two-digit references in [brackets] are the same as those in the published version of the paper, the others are listed below.

     Dataset Name 


     Subset Size 


     Distortions 


     Resolutions 


     bpp 


     fps 


    |
    |

     PSNR 


     SSIM 


     MS-SSIM 


     VIF [5] 


     VMAF 


     WPSNR 


     XPSNR 


     Yonsei [33]

     10×12

     H264, VP9,
     HEVC [1]

     3840×2160

     8

     30

     0.833
     1.377

     0.910
     0.908

     0.863
     1.128

     0.922
     0.964

     0.945
     0.823

     0.928
     0.926

     0.942
     0.825

     LIVE Vid. [34]

     10×15

     H264, wl,
     MPEG-2, ip

     768×432

     8

     25,
     50

     0.543
     9.182

     0.659
     7.860

     0.707
     7.445

     0.530
     9.292

     0.749
     7.097

     0.638
     8.472

     0.702
     7.836

     IVP [35]

     10×{10, 14}

     dirac, H264,
     MPEG-2, ip

     1920×1088

     8

     25

     0.638
     0.810

     0.597
     0.813

     0.557
     0.864

     0.233*
     0.997

     0.592
     0.853

     0.687
     0.770

     0.708
     0.749

     ECVQ [36]

     8×{11, 12}

     H264,
     MPEG-4

     352×288

     8

     25

     0.735
     11.60

     0.895
     7.534

     0.887
     7.621

     0.883
     7.991

     0.831
     9.554

     0.875
     8.047

     0.864
     8.417

     EVVQ [36]

     8×{11, 12}

     H264,
     MPEG-4

     640×480

     8

     25

     0.727
     10.46

     0.896
     6.670

     0.897
     6.570

     0.864
     7.499

     0.941
     5.137

     0.898
     6.571

     0.912
     6.161

     SJTU [37]

     10×6

     HEVC

     3840×2160

     8

     30

     0.754
     0.328

     0.788
     0.312

     0.813
     0.304

     0.771
     0.325

     0.828
     0.293

     0.791
     0.316

     0.882
     0.232

     JVET CfP [20]

     10×8

     HEVC,
     HHI [39]

     1920×1080,
     3840×2160

     10

     30–
     60

     0.724
     1.662

     0.809
     1.402

     0.832
     1.208

     TBD*
     TBD

     0.863
     1.225

     0.699
     1.722

     0.866
     1.213

     JVET rep. [10]

     14×10

     HEVC,
     VTM-5 [3]

     1920×1080,
     3840×2160

     8,
     10

     30–
     60

     0.732
     10.27

     0.856
     7.637

     0.815
     8.835

     TBD*
     TBD

     0.862
     7.693

     0.765
     9.767

     0.827
     8.494

     CSIQ Vid. [4]

     12×9

     H264, snow,
     HEVC

     832×480

     8

     24–
     60

     0.754
     12.69

     0.871
     9.210

     0.864
     9.317

     0.892
     8.764

     0.863
     9.679

     0.874
     9.403

     0.880
     9.210


     Mean


     107.3


     


     


     


     

    |
    |


     0.716


     0.809


     0.804


     0.810


     0.830


     0.795


     0.843

    Legend:

    bpp: bits per pixel, fps: frames per second, PLCC: Pearson linear, SROCC: Spearman rank-order correlation coefficient, italic: root mean squared error (RMSE),
    *: values excluded from averaging the correlation values (the low correlation value for VIF on the IVP subset is still being investigated), TBD: to be determined.

    Notes:

    The SROCC data for the newly added CSIQ Video subset are: PSNR 0.7543, SSIM 0.8788, MS-SSIM 0.8711, VMAF 0.8633, WPSNR 0.8770, XPSNR 0.8775.
    The PLCC value for XPSNR without in-place smoothing of the block weights w (see Sec. 6 in the paper) on the ECVQ dataset is 0.789 with an RMSE of 10.47.


  2. Subjective demonstrations of the sensitivity weighting in the perceptually weighted PSNR measure

    The following illustrations demonstrate the block or sample-wise visual sensitivity weighting in the WPSNR metric.  These weights can be used to control a perceptually optimized quantizer in codecs like High-Efficiency Video Coding (HEVC) [1] or Versatile Video Coding (VVC) [2, 3].  A dark (blue) color indicates a high visual activity and low weight w < 1 while a bright (orange) color means a low activity and high weight w > 1.

    Lena, luminance channel, original image

    Lena, luminance channel, bWPSNR weights     Lena, luminance channel, sWPSNR weights     color legend

    link to external web page Lena image, top: original, left: block-wise bWPSNR sensitivity weights, center: sample-wise sWPSNR sensitivity weights, right: color legend.

    BQTerrace, luminance channel, original image

    BQTerrace, luminance channel, bWPSNR weights     BQTerrace, luminance channel, sWPSNR weights     color legend

    BQTerrace image, top: original, left: block-wise bWPSNR sensitivity weights, center: sample-wise sWPSNR sensitivity weights, right: legend.

Please note that the layout or content of this web page may change.  If it does, the date in the following line will be updated.

Christian R. Helmrich, April 18, 2020



References

[1]  ITU-T, Recommendation H.265 and ISO/IEC, Int. Standard 23008-2, “High efficiency video coding,” Geneva, Switzerland, Feb. 2018. Online:
       link to external web page http://www.itu.int/rec/T-REC-H.265

[2]  B. Bross, J. Chen, S. Liu, “Versatile Video Coding (Draft 5),” MPEG/JVET output document N1001, Geneva, Switzerland, July 2019. Online:
       link to external web page http://phenix.it-sudparis.eu/jvet/doc_end_user/current_document.php?id=6640

[3]  JVET, “VVCSoftware_VTM: VVC VTM reference software,” May 2019. Online: link to external web page https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/tags

[4]  P. V. Vu and D. M. Chandler, “ViS3: An Algorithm for Video Quality Assessment via Analysis of Spatial and Spatiotemporal Slices,” J. Electr.
       Imaging
, vol. 23, no. 1, Jan./Feb. 2014. DOI: 10.1117/1.JEI.23.1.013016. Online: link to external web page http://vision.eng.shizuoka.ac.jp/mod/page/view.php?id=24

[5]  H. R. Sheikh and A. C. Bovik, “Image Information and Visual Quality,” 2006. Online: link to external web page https://live.ece.utexas.edu/research/Quality/VIF.htm via
       link to external web page https://github.com/andrewekhalel/sewar.