|
|
Efficiency is the ability to avoid wasting materials, energy, efforts, money, and time in doing something or in producing a desired result. [...] In more mathematical or scientific terms, it is a measure of the extent to which input is well used for an intended task or function (output). It often specifically comprises the capability of a specific application of effort to produce a specific outcome with a minimum amount or quantity of waste, expense, or unnecessary effort.
Wikipedia page on efficiency, 2017
The audio Compact Disc (CD) and Super Audio CD (SACD) can store a digital audio signal without audible quality loss, especially when proper analog-to-digital conversion, sampling rate conversion, and noise-shaped bit-depth conversion [ Helmrich, 2007] are used. However, the uncompressed pulse modulation formats which they employ are inefficient since both redundant and perceptually irrelevant information remains in the coded data. In other words, storage capacity is wasted on unimportant signal parts.
On this page, some more recent and much more efficient audio compression formats are presented. I had the privilege to participate in the development of some of them.
MPEG-4 High-Efficiency Advanced Audio Coding (HE-AAC) in Winamp 2008–2013
Nullsoft's Winamp was one of the first – and, in my humble opinion, still one of the best – freely available software media players. Around the turn of the century I started using it extensively and even created my own skin and file format plug-in. Needless to say that I was very excited when, working for Fraunhofer, I was given the opportunity to optimize Fraunhofer's HE-AAC encoder for integration into Winamp 5.6 in 2011. Specifically, I increased the audio coding quality on some input signals and in VBR mode.
MPEG-4 HE-AAC, published in ISO/IEC 14496-3, remains one of the most widely used coding specifications. Developed a few years after MP3, it outperforms the latter in audio reconstruction quality at a specified average bit-rate and complexity. Fraunhofer's HE-AAC encoder bundled with Winamp versions 5.62 (June 2011) through 5.666 (December 2013)
fully exploits the coding capabilities offered by the ISO/IEC standard at a very high encoding speed. In independent blind listening tests conducted between July 2011 and November 2013, it ranked first (at 64 and 96 kbit/s VBR stereo) and second (at 96 kbit/s VBR stereo) among the best HE-AAC encoders available today.
Unfortunately, AOL, who had acquired Nullsoft in 1999, abandoned Winamp in 2014, and new versions (which haven't been released as of 2019) will not include Fraunhofer's HE-AAC encoder any more [ source]. Update July 2019 It seems that all corporate development of Winamp has stopped, only an independent group of enthusiasts is still trying to maintain Winamp updates and fixes through WACUP (WinAmp Community Update Project). However, the last corporate version 5.666 is still publicly available for download, so you can keep using the integrated HE-AAC encoder free of charge:
To convert your audio files into the HE-AAC format, mark all tracks in Winamp by left-clicking on them (holding Ctrl or Shift if necessary), then right-click into the playlist and select «Send to:», «Format Converter». In the Encoding Format drop-down list of the appearing dialog box, select «MPEG-4 AAC Encoder v1.08» and configure it as desired.
Figure 1. How to convert audio files into the HE-AAC format using the Format Converter dialog in Winamp.
Note that more recent Winamp versions from www.winamp.com may look differently and/or the illustrated dialogs may not exist anymore.
MPEG-D Unified Speech and Audio Coding (USAC) and MPEG-H Audio 2010–2016
During and after my work on Fraunhofer's HE-AAC encoder, I joined the development of two other codec standards: MPEG-D Extended HE-AAC, published under the name unified speech and audio coding (USAC) in ISO/IEC 23003-3, and MPEG-H Audio, specified in a second edition in ISO/IEC 23008-3.
MPEG-H Audio is based on MPEG-D USAC but allows for higher multichannel coding efficiency and, in its Low Complexity configuration, requires fewer decoding operations. A detailed comparison of the two codecs, including results of formal subjective tests, is provided in my dissertation [ Helmrich, 2017].
In my dissertation I demonstrate that, at bit-rates as low as 48 kbit/s stereo, MPEG-H Audio provides at least the same audio coding quality as USAC (sometimes even better quality) and does so at approximately one third lower decoding complexity. Since USAC outperformed all competing codecs in a subjective verification test in 2011 [ report] and I think that bit-rates lower than 24 kbit/s per channel are unnecessary nowadays, I consider MPEG-H Audio the most efficient lossy codec developed as of 2020.
Recent versions of the OPUS codec (see below) or Dolby's AC-4 may match MPEG-H in audio quality at higher bit-rates, but I don't expect them to outperform it at any bit-rate (an assumption which, of course, needs to be verified via formal subjective evaluation).
In June 2018, Fraunhofer's xHE-AAC (the combination of MPEG-D USAC and MPEG-D dynamic range control, see this page) decoder implementation found its way into the Android operating system [ source] and in late 2019, xHE-AAC decoding functionality was also added to Apple and Amazon products [ source]. In July 2020 and January 2021, Microsoft and Netflix, respectively, joined the list of early xHE-AAC licensees and adopters [ source, source], followed by LG and Facebook in mid 2021 [ source]. Therefore, xHE-AAC playback is supported on a billion consumer devices. Unfortunately, no free-of-charge MPEG-H Audio encoder is available as of late 2021 for testing and content generation, but as the following articles from 2017 show, at least some MPEG-H Audio decoders have already found their way into new-generation hardware devices:
So it seems that we simply have to wait for more MPEG-H Audio ready hardware and software to arrive before we can draw ultimate conclusions on its coding performance. The results of the USAC and MPEG-H Audio «Baseline» verification tests are as follows:
Figure 2. Results of the formal USAC verification test (higher-rate stereo) conducted by MPEG. (a) Speech input, (b) mixed speech-and-music input, (c) music input, (d) all, averaged across (a)–(c).
See section 2.6 of my dissertation for details.
Figure 3. Results of the 'Baseline' MPEG-H Audio verification test. (a) 2.0 stereo, (b) 5.1, (c) 5.1 plus 2 additional height channels, (d) all, avera- ges of (a)–(c). 7.0- kHz anchor, 3.5-kHz anchor. Data taken from MPEG output document.
Update Jan. 2020 To offer previously unavailable open-access software for high-quality Extended HE-AAC encoding, I recently created exhale, which is an acronym for «ecodis extended high-efficiency and low-complexity encoder». The latest stable, tested exhale release, version 1.2.1 with mono, stereo, and basic 5.1 USAC coding functionality and usage documentation, is freely accessible via exhale's Git repository. Discussions of the encoder, with helpful explanations on how to use it in practice, are archived on the HydrogenAudio forum. exhale is being made available under an open-access license (with a «no patents granted» disclaimer) similar to the one for Fraunhofer's FDK AAC:
Note that exhale focuses on low-complexity medium and high bit-rate coding and, as such, does not utilize all coding technology offered by the ISO/IEC 23003-3 standard, preventing it from encoding at rates below roughly 20 kbit/s mono and 40 kbit/s stereo. For bit-rates at or above 24 kbit/s mono and 48 kbit/s stereo, however, I achieved my initial goal of high software and audio quality, according to some independent tests (see below). I'll keep the source code text short and simple for educational purposes. A high encoding speed is not a primary target but may also be addressed eventually. The exhale license text and release notes are also hosted here and here, respectively.
Update Sep. 2020 Meanwhile, two independent personal blind listening tests, carefully conducted using a large number of CD audio items, have been published: one at 64 kbit/s stereo, the other at 96 and 128 kbit/s stereo. With pleasure I can report that the results of these comparative tests confirm exhale's high audio and software quality:
exhale achieves «excellent» overall audio quality (MOS above 3.9) at all bit-rates,
it outperforms all other tested similar-bit-rate audio encoders at 64 and 96 kbit/s,
with only one exception, its worst-case per-item quality remains above-average,
no critical software issues were reported recently, so the implementation is stable.
A third test at 192 kbit/s stereo confirms exhale's performance at very high bit-rates. In other words, my mid-year goal has been fully achieved, and I can concentrate on the low-bit-rate operating points, 24 kbit/s mono and 48 kbit/s stereo, in the future. The following figure summarizes, and links to, the reports of the blind listening tests on the HydrogenAudio forum. You can click on each graphic to access the respective web page.
Figure 4. Results of the first two personal compa- rative blind listening tests including exhale, reported on the HydrogenAudio forum in summer of 2020.
Left: low, center: medium, right: high coding bit-rate. The low-bit-rate scores are per-codec averages of two subtests (classical & pop).
For those using the audio playback software foobar2000: since version 1.5.3 of March 2020, foobar2000 can play back xHE-AAC audio in MPEG-4 files, including those generated by exhale, gaplessly by means of Christopher Snowhill's FDK-AAC packet decoder. (Update June 2023 That decoder is now being maintained by «Case», see this page.) Many thanks to Christopher, Peter, and Case for their work on foobar2000 and, particularly, for releasing a working version of this decoder component so quickly!
3GPP Enhanced Voice Services (EVS) for Voice-over-IP Communication 2010–2014
The EVS codec, specified in 3GPP TS 26.441, is for low-delay communication what MPEG-H Audio is for streaming or broadcasting: the most efficient speech and audio codec money can buy. In fact, it outperforms all prior ETSI/ITU-T/3GPP codecs for the same application and, at rates below 32 kbit/s, also the OPUS codec (see below):
Figure 5. Results of the EVS verification tests (all mono input signals) con- ducted by Nokia in 2014.
See section 3 of Nokia's IEEE paper for details.
(Fig. copyright A. Rämö) Bit-rate (kbit/s)
At Fraunhofer, I participated in the development of the EVS transform coding design, and I used the experience which I gained from optimizing Fraunhofer's HE-AAC encoder (see above) to fine-tune the audio reconstruction quality of the EVS codec, especially at the 16.4 and 24.4 kbit/s operating points. As Fig. 5 illustrates, the perceptual quality achieved by EVS in its superwideband (SWB, up to 16 kHz bandwidth) and fullband (FB, up to 20 kHz bandwidth) modes at such low bit-rates ended up being quite remarkable.
By the end of 2017, several smartphone vendors have added support for EVS coding and decoding to their devices (sometimes labeled HD Voice Plus™), including Apple's iPhone 8/X, LG's G5, Samsung's Galaxy S7, Sharp's Aquos Zeta, Sony's Xperia X/XZ, and Xiaomi's MI5. Using these and later smartphones, high-quality EVS-powered voice calls can be made with service providers in Germany [ source] and Spain [ source], the United Kingdom [ source], the United States [ source], and Japan [ source].
The latest version of the EVS software encoder and decoder is available via this link:
A stereo, surround, and VR extension for EVS is currently being developed under the 3GPP work item 770024. Completion of the resulting Immersive Voice and Audio Services (IVAS) codec, to be standardized in 3GPP specification 26.253, is planned for 2021. Update 2024 That plan had been postponed to 3GPP release 18 in 2024, and the final software release after completion of the standard is now available here.
FSLAC: A FLAC Backward-Compatible Free Semi-Lossless Audio Coder Nov–Dec 2016
Having finished the first edition of my dissertation [ Helmrich, 2017], I finally had the time for an after-work project which had long been on my to-do list: a constrained VBR (CVBR) version of the publicly available open-source lossless audio coder FLAC.
FLAC, being a mathematically lossless audio codec, inevitably creates VBR streams as compressed files. Depending on the «difficulty» of coding each segment of the audio signal, the instantaneous coding bit-rate can be quite high. However, one can observe that, during passages of high FLAC bit-rate, the coded audio also exhibits the greatest ability of psychoacoustic masking. FSLAC exploits this property to limit the maximum instantaneous bit-rate of the compressed file. It does so by detecting the difficult audio blocks (by measuring their predictability via linear-prediction error energy calculations) and requantizing each of the detected blocks to a lower bit-depth, thereby reducing the bit-rate needed for lossless coding of that block. To prevent the quantization error from becoming audible (or visible in a spectrogram), simple adaptive noise shaping is used.
This approach is similar to the one used by LossyWAV, but differs in two important aspects. First, FSLAC is not a stand-alone pre-processor but instead is coupled with a FLAC encoder and, hence, directly creates FLAC compatible compressed files. Second, FSLAC only alters the high-bit-rate audio segments, not (almost) all parts of the audio input as LossyWAV does. The coded audio, therefore, remains perceptually lossless. In addition, it is worth noting that, due to its simplicity, FSLAC encoding is very fast. All of these features make FSLAC attractive for audio production and archival applications.
You can download an executable of a FLAC encoder with added FSLAC functionality below (Win32 release, compiled with Visual Studio 2012). The FSLAC.EXE can be used on the Windows command-line or, when renaming it to FLAC.EXE, in foobar2000. Note, however, the list of known issues with this executable. If you are interested in the specifics of the FSLAC algorithm or would like to compile your own binary (kindly let me know if you did!), you can also download the source code file that I had to modify. All FSLAC related code is encapsulated in the define ECODIS_CVBR_MODE (line 72).
An untested version of the modified source code file for FLAC 1.3.3 is available here. To convert your lossless files using FSLAC in foobar2000, select the tracks to convert in the playlist, then right-click on a marked track and select «Convert», «Quick Convert». In the appearing dialog box, select «FLAC <N/A>», edit the compression level if desired (the lower the level, the lower the maximum FSLAC bit-rate—level 8 limits around 1000 kbit/s) and, after clicking on «Convert», locate the downloaded F(S)LAC.EXE if required.
Figure 6. Influence of the FSLAC comp- ression level on the output bit-rate when coding the SQAM inputs with the high- est lossless bit-rate.
Lossless VBR input: FLAC -8, trans- coded CVBR output: FSLAC -6, FSLAC -4, and FSLAC -3. Note how the high-rate signals 2, 3, 10, 13, and 16 are affected most by the bit-rate limiting.
OPUS: A General-Purpose Speech and Audio Codec for Streaming 2009–2018
The OPUS speech and audio codec, as specified in 2012 by the Internet Engineering Task Force in RfC 6716, is an open-source codec with surprisingly good subjective performance which, like FLAC (see above), claims to be free software. Until 2019, the encoder was being regularly improved for better quality, as could be observed here.
Since version 1.1 from 2013, OPUS yields encodings of very high quality, except at very low bit-rates (see Fig. 5 above). The last version is 1.5.2 from 2024, featuring a good speech/music classifier and slightly better coding quality especially at low rates. I beta-tested version 1.3 in June 2018 and reported a coding bandwidth detection and a speech/music classification problem, both of which were corrected in the final release. Being a low-delay codec with a speech core (SILK), OPUS also competes with EVS.
For all historians: some early documentation of OPUS's transform coding core, called Constrained Energy Lapped Transform (CELT), is archived here (papers from 2009). A further subjective comparison of OPUS and EVS on music content is available here.
Summary: Audio Coding has Matured, Future: More Post-Processing May 2018
My experience in audio coding research and development leads me to conclude that the performance of the last-generation audio codecs, as indicated above and in my dissertation [ Helmrich, 2017], has matured to the point where the small perceptual benefits of more elaborate signal processing are not worth the increase in algorithmic complexity. Moreover, I believe this is the case for both the decoder specifications and encoder implementations. At least for mono, stereo, and 5.1 surround, the inaccuracies of human hearing seem largely exploited (by irrelevancy removal and parametric coding), and the remaining statistical redundancies within the coded data have been reduced to a minimum (except maybe in some signaled parameters, see sec. 3.6.2 of my doctorate thesis). The only thing left to do, in my view, is to make greater use of post-processors similar to the in-loop filters in video coding. Such techniques can improve the reconstruction quality of very tonal or transient signals, as has been demonstrated by the LTP post-filter in EVS and MPEG-H Audio as well as an applause post-filter.
Update Nov. 2018 Neural-network based post-filtering for blind bandwidth extension of speech at low bit-rates also seems a promising topic for further research, see here.
Further, Lesser Known Audio Codecs and Links to Additional Resources 2019–2020
AVS3, China's latest-generation Audio Video System standard. Tech- nically, its audio coding part (also known as China 3D Audio) seems to be similar to some parts of the MPEG-H Audio codec, thus with probably similar compression performance. See this press release.
The Bluetooth™ LC3 and ETSI LC3plus speech and audio codecs are very low complexity and low-delay equivalents of the 3GPP EVS codec for Bluetooth™ or DECT enabled low-energy wireless devices like earphones. See also this page. An interactive demo of the LC3 codec at five bit-rates is published on this page.
page last modified in Aug. 2024, updated IVAS release text
|
|