|
|
A Study of the Extended Perceptually Weighted Peak Signal-to-Noise Ratio (XPSNR) for Video Compression with Different Resolutions and Bit Depths
Christian R. Helmrich, Senior Member, IEEE, Sebastian Bosse, Heiko Schwarz, Detlev Marpe, Fellow, IEEE, and Thomas Wiegand, Fellow, IEEE
The above paper has been accepted for publication in the ITU Journal: ICT Discoveries – Special Issue: The Future of Video and Immersive Media in March 2020. This page provides supplementary information on the publication and a visual demo.
Note that there is also a preceding introductory paper, presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) in May 2020, to which this page is also applicable. That paper is available on this page.
An open-source implementation of the XPSNR algorithm is freely available on GitHub: XPSNR filter plug-in for FFmpeg
Contents:
-
Corrections to the ICT ITU Journal published version of the paper as of April 2020
-
Visual demonstrations of the WPSNR and XPSNR perceptual sensitivity weighting
|
|
|
|
Corrections to the ITU Journal: ICT Discoveries published version of the paper as of April 18, 2020
Some content was corrected and/or clarified. The submitted version of the paper is available here: Journal manuscript
The following is a corrected and expanded Table 1 with Pearson linear correlation coefficients, after third-order polynomial mapping according to ITU-T Rec. P.1401. The CSIQ and VIF values were added recently and are not included in the paper. The two-digit references in [brackets] are the same as those in the published version of the paper, the others are listed below.
Dataset Name
|
Subset Size
|
Distortions
|
Resolutions
|
bpp
|
fps
|
| | |
PSNR
|
SSIM
|
MS-SSIM
|
VIF [5]
|
VMAF
|
WPSNR
|
XPSNR
|
Yonsei [33] |
10×12 |
H264, VP9, HEVC [1] |
3840×2160 |
8 |
30 |
|
0.833 1.377 |
0.910 0.908 |
0.863 1.128 |
0.922 0.964 |
0.945 0.823 |
0.928 0.926 |
0.942 0.825 |
LIVE Vid. [34] |
10×15 |
H264, wl, MPEG-2, ip |
768×432 |
8 |
25, 50 |
|
0.543 9.182 |
0.659 7.860 |
0.707 7.445 |
0.530 9.292 |
0.749 7.097 |
0.638 8.472 |
0.702 7.836 |
IVP [35] |
10×{10, 14} |
dirac, H264, MPEG-2, ip |
1920×1088 |
8 |
25 |
|
0.638 0.810 |
0.597 0.813 |
0.557 0.864 |
0.233* 0.997 |
0.592 0.853 |
0.687 0.770 |
0.708 0.749 |
ECVQ [36] |
8×{11, 12} |
H264, MPEG-4 |
352×288 |
8 |
25 |
|
0.735 11.60 |
0.895 7.534 |
0.887 7.621 |
0.883 7.991 |
0.831 9.554 |
0.875 8.047 |
0.864 8.417 |
EVVQ [36] |
8×{11, 12} |
H264, MPEG-4 |
640×480 |
8 |
25 |
|
0.727 10.46 |
0.896 6.670 |
0.897 6.570 |
0.864 7.499 |
0.941 5.137 |
0.898 6.571 |
0.912 6.161 |
SJTU [37] |
10×6 |
HEVC |
3840×2160 |
8 |
30 |
|
0.754 0.328 |
0.788 0.312 |
0.813 0.304 |
0.771 0.325 |
0.828 0.293 |
0.791 0.316 |
0.882 0.232 |
JVET CfP [20] |
10×8 |
HEVC, HHI [39] |
1920×1080, 3840×2160 |
10 |
30– 60 |
|
0.724 1.662 |
0.809 1.402 |
0.832 1.208 |
TBD* TBD |
0.863 1.225 |
0.699 1.722 |
0.866 1.213 |
JVET rep. [10] |
14×10 |
HEVC, VTM-5 [3] |
1920×1080, 3840×2160 |
8, 10 |
30– 60 |
|
0.732 10.27 |
0.856 7.637 |
0.815 8.835 |
TBD* TBD |
0.862 7.693 |
0.765 9.767 |
0.827 8.494 |
JVET VT1 [6] |
5×15 |
HEVC, VV- enC, VTM |
3840×2160 |
10 |
30, 60 |
|
0.830 1.319 |
0.808 1.369 |
0.928 0.881 |
TBD* TBD |
0.782 1.465 |
0.845 1.234 |
0.905 1.010 |
CSIQ Vid. [4] |
12×9 |
H264, snow, HEVC |
832×480 |
8 |
24– 60 |
|
0.754 12.69 |
0.871 9.210 |
0.864 9.317 |
0.892 8.764 |
0.863 9.679 |
0.874 9.403 |
0.880 9.210 |
Mean |
104 |
|
|
|
|
| | |
0.727 |
0.809 |
0.816 |
0.810 |
0.826 |
0.800 |
0.849 |
Legend:
bpp: bits per pixel, fps: frames per second, PLCC: Pearson linear, SROCC: Spearman rank-order correlation coefficient, italic: root mean squared error (RMSE),
*: values excluded from averaging the correlation values (the low correlation value for VIF on the IVP subset is still being investigated), TBD: to be determined.
Notes:
The SROCC data for the newly added CSIQ Video subset are: PSNR 0.7543, SSIM 0.8788, MS-SSIM 0.8711, VMAF 0.8633, WPSNR 0.8770, XPSNR 0.8775.
The PLCC value for XPSNR without in-place smoothing of the block weights w (see Sec. 6 in the paper) on the ECVQ dataset is 0.789 with an RMSE of 10.47.
Subjective demonstrations of the sensitivity weighting in the perceptually weighted PSNR measure
The following illustrations demonstrate the block or sample-wise visual sensitivity weighting in the WPSNR metric. These weights can be used to control a perceptually optimized quantizer in codecs like High-Efficiency Video Coding (HEVC) [1] or Versatile Video Coding (VVC) [2, 3]. A dark (blue) color indicates a high visual activity and low weight w < 1 while a bright (orange) color means a low activity and high weight w > 1.
Lena image, top: original, left: block-wise bWPSNR sensitivity weights, center: sample-wise sWPSNR sensitivity weights, right: color legend.
BQTerrace image, top: original, left: block-wise bWPSNR sensitivity weights, center: sample-wise sWPSNR sensitivity weights, right: legend.
|
|
|
Please note that the layout or content of this web page may change. If it does, the date in the following line will be updated.
Christian R. Helmrich, October 9, 2020
References
[1] ITU-T, Recommendation H.265 and ISO/IEC, Int. Standard 23008-2, “High efficiency video coding,” Geneva, Switzerland, Feb. 2018. Online: http://www.itu.int/rec/T-REC-H.265
[4] P. V. Vu and D. M. Chandler, “ViS3: An Algorithm for Video Quality Assessment via Analysis of Spatial and Spatiotemporal Slices,” J. Electr. Imaging, vol. 23, no. 1, Jan./Feb. 2014. DOI: 10.1117/1.JEI.23.1.013016. Online: http://vision.eng.shizuoka.ac.jp/mod/page/view.php?id=24
[6] C. R. Helmrich, B. Bross, J. Pfaff, H. Schwarz, D. Marpe, and T. Wiegand, “Information on and analysis of the VVC encoders in the SDR UHD verification test,” JVET input document T0103, teleconference, Oct. 2020. Online: JVET-T0103 (including information from JVET-T0097)
|
|