efficient coding of digital signals

  go back to the home page  Home       Audio       Video       About the Author 

 by C. Helmrich

Audio coding projects

    ecodis :: Efficient Audio Codecs

Audio codecs which I worked on:

link to sub-page  HE-AAC Encoder in Winamp 5.6
A free high-quality MPEG-4 encoder

link to sub-page  MPEG-D USAC, MPEG-H Audio
Royalty-based broadcasting codecs

link to sub-page  3GPP Enhanced Voice Services
Royalty-based voice-over-IP codec

link to sub-page  Free semi-lossless audio coder
Constrained VBR coding with FLAC

Other state-of-the-art codecs:

link to sub-page  OPUS codec (IETF RfC 6716)
Best royalty-free open-source codec

link to sub-page  My comments on audio coding
Coding mature, more post-processing

link to sub-page  Stereo and mono codec demos
G.721, MP3, (x)HE-AAC, EVS, OPUS

Efficiency is the ability to avoid wasting materials, energy, efforts, money, and time in doing something or in producing a desired result. [...] In more mathematical or scientific terms, it is a measure of the extent to which input is well used for an intended task or function (output). It often specifically comprises the capability of a specific application of effort to produce a specific outcome with a minimum amount or quantity of waste, expense, or unnecessary effort.

link to external web page Wikipedia page on efficiency, 2017

The audio Compact Disc (CD) and Super Audio CD (SACD) can store a digital audio signal without audible quality loss, especially when proper analog-to-digital conversion, sampling rate conversion, and noise-shaped bit-depth conversion [link to sub-page Helmrich, 2007] are used. However, the uncompressed pulse modulation formats which they employ are inefficient since both redundant and perceptually irrelevant information remains in the coded data. In other words, storage capacity is wasted on unimportant signal parts.

   On this page, some more recent and much more efficient audio compression formats are presented. I had the privilege to participate in the development of some of them.

MPEG-4 High-Efficiency Advanced Audio Coding (HE-AAC) in Winamp

   Nullsoft's link to external web page Winamp was one of the first – and, in my humble opinion, still one of the best – freely available software media players. Around the turn of the century I started using it extensively and even created my own skin and file format plug-in. Needless to say that I was very excited when, working for Fraunhofer, I was given the opportunity to optimize Fraunhofer's HE-AAC encoder for integration into Winamp 5.6 in 2011. Spe­cifically, I increased the audio coding quality on some input signals and in link to external web page VBR mode.

   MPEG-4 HE-AAC, published in ISO/IEC link to external web page 14496-3, remains one of the most widely used coding specifications. Developed a few years after MP3, it outperforms the latter in audio reconstruction quality at a specified average bit-rate and complexity. Fraunhofer's HE-AAC enco­der bundled with Winamp versions link to external web page 5.62 (June 2011) through link to external web page 5.666 (December 2013) fully exploits the coding capabilities offered by the ISO/IEC standard at a very high encoding speed. In independent blind listening tests con­ducted between July 2011 and November 2013, it ranked first (at link to external web page 64 and link to external web page 96 kbit/s VBR stereo) and second (at link to external web page 96 kbit/s VBR stereo) among the best HE-AAC encoders available today.

   Unfortunately, AOL, who had acquired Nullsoft in 1999, abandoned Winamp in 2014, and new versions (which haven't been released as of 2018) will not include Fraunhofer's HE-AAC encoder any more [link to external web page source]. However, version 5.666 is still publicly available for download, so you can keep using the integrated HE-AAC encoder free of charge:

Winamp logolink to external web page  Download Winamp 5.666 (Build 3516)
with Fraunhofer's HE-AAC Encoder v03.02.16

To convert your audio files into the HE-AAC format, mark all tracks in Winamp by left-clicking on them (holding Ctrl or Shift if necessary), then right-click into the playlist and select «Send to:», «Format Converter». In the Encoding Format drop-down list of the appearing dialog box, select «MPEG-4 AAC Encoder v1.08» and configure it as desired.

Winamp - Format Converter dialog (click to view full-sized)

Figure 1. How to convert audio files into the HE-AAC
format using the Format Converter dialog in Winamp.
Winamp - Send to: Format Converter (click to view full-sized)

MPEG-D Unified Speech and Audio Coding (USAC) and MPEG-H Audio

   During and after my work on Fraunhofer's HE-AAC encoder, I joined the development of two other coding standards: MPEG-D Extended HE-AAC (xHE-AAC), published under the name unified speech and audio coding in ISO/IEC link to external web page 23003-3, and MPEG-H Audio, specified in a second edition in ISO/IEC link to external web page 23008-3. MPEG-H Audio is based on MPEG-D xHE-AAC but allows for higher multichannel coding efficiency and, in its Low Complexity configuration, requires fewer decoding operations. A detailed comparison of the codecs, including subjective test results, is provided in my dissertation [link to sub-page Helmrich, 2017].

   In my dissertation I demonstrate that, at bit-rates as low as 48 kbit/s stereo, MPEG-H Audio yields at least the same audio coding quality as xHE-AAC (sometimes even better quality), and it does so at roughly one third lower decoding complexity. Since xHE-AAC outperformed all competing codecs in a subjective verification test in 2011 [link to external web page report] and I think that bit-rates lower than 24 kbit/s per channel are unnecessary nowadays, I consider MPEG-H Audio the most efficient lossy codec developed as of 2018. Recent versions of the OPUS codec (see link to sub-page below) or Dolby's AC-4 may match MPEG-H in audio quality at higher bit-rates, but I don't expect them to outperform it at any bit-rate (an assumption which, of course, needs to be verified via formal subjective evaluation).

   In June 2018, Fraunhofer's xHE-AAC decoder implementation found its way into the Android operating system [link to external web page source], making support for the codec publicly available. Unfortunately, no free-of-charge MPEG-H Audio encoder is available as of mid 2018 for testing and content generation, but as the following articles from 2017 indicate, at least some decoders have already found their way into new-generation hardware devices:

MPEG-H trademark logo, copyright Fraunhofer IISlink to external web page LG Licenses MPEG-H Software from
Fraunhofer IIS
(article on Hugh's News)

link to external web page New Products Supporting MPEG-H
Audio Hitting the Market
(blog article)

So it seems that we simply have to wait for more MPEG-H Audio ready hardware and software to arrive before we can draw ultimate conclusions on its coding performance. In the meantime, I can only present the results of the xHE-AAC stereo verification test:

Figure 2. Results of the
formal USAC verification
test (higher-rate stereo)
conducted by MPEG. (a)
Speech input, (b) mixed
speech-and-music input,
(c) music input, (d) all,
averaged across (a)–(c).

    See section 2.6 of my
dissertation for details.
xHE-AAC verification test results (click to view full-sized)

3GPP Enhanced Voice Services (EVS) for Voice-over-IP Communication

   The EVS codec, specified in 3GPP link to document TS 26.441, is for low-delay communication what MPEG-H Audio is for streaming or broadcasting: the most efficient speech and audio codec money can buy. In fact, it outperforms all prior ETSI/ITU-T/3GPP codecs for the same application and, at rates below 32 kbit/s, also the OPUS codec (see link to sub-page below):

Figure 3.  Results of the
EVS verification tests (all
mono input signals) con-
ducted by Nokia in 2014.

See section 3 of Nokia's
IEEE paper for details.

(Fig. copyright A. Rämö)
EVS verification test results (click to view full-sized)
Bit-rate (kbit/s)

   At Fraunhofer, I participated in the development of the EVS transform coding design, and I used the experience which I gained from optimizing Fraunhofer's HE-AAC encoder (see link to previous section above) to fine-tune the audio reconstruction quality of the EVS codec, especially at the 16.4 and 24.4 kbit/s operating points. As Fig. 3 illustrates, the perceptual quality achieved by EVS in its superwideband (SWB, up to 16 kHz bandwidth) and fullband (FB, up to 20 kHz bandwidth) modes at such low bit-rates ended up being quite remarkable.

   By the end of 2017, several smartphone vendors have added support for EVS coding and decoding to their devices (sometimes labeled HD Voice Plus™), including Apple's iPhone 8/X, LG's G5, Samsung's Galaxy S7, Sharp's Aquos Zeta, Sony's Xperia X/XZ, and Xiaomi's MI5. Using these and later smartphones, high-quality EVS-powered voice calls can be made with service providers in Germany [link to external web page source] and Spain [link to external web page source], the United Kingdom [link to external web page source], the United States [link to external web page source], and Japan [link to external web page source].

   The latest version of the EVS software encoder and decoder is available via this link:

HD Voice Plus logo, copyright GSMA Ltd.
link to document Download EVS Software version 14.2.0

floating-point software edition from 3GPP site

   A stereo, surround, and VR extension for EVS is currently being developed under the 3GPP work item link to external web page 770024. Completion of the resulting Immersive Voice and Audio Services (IVAS) codec, to be standardized in 3GPP specification link to external web page 26.253, is planned for December 2019. I will update this page once the IVAS software is publicly available.

FSLAC: A FLAC Backward-Compatible Free Semi-Lossless Audio Coder
Nov–Dec 2016

   Having finished the first edition of my dissertation [link to sub-page Helmrich, 2017], I finally had the time for an after-work project which had long been on my to-do list: a constrained VBR (CVBR) version of the publicly available open-source lossless audio coder link to external web page FLAC.

   FLAC, being a mathematically lossless audio codec, inevitably creates link to external web page VBR streams as compressed files. Depending on the «difficulty» of coding each segment of the audio signal, the instantaneous coding bit-rate can be quite high. However, one can observe that, during passages of high FLAC bit-rate, the coded audio also exhibits the greatest ability of psychoacoustic masking. FSLAC exploits this property to limit the maximum instantaneous bit-rate of the compressed file. It does so by detecting the difficult audio blocks (by measuring their predictability via linear-prediction error energy calculations) and requantizing each of the detected blocks to a lower bit-depth, thereby reducing the bit-rate needed for lossless coding of that block. To prevent the quantization error from becoming audible (or visible in a spectrogram), simple adaptive noise shaping is used.

   This approach is similar to the one used by link to external web page LossyWAV, but differs in two important aspects. First, FSLAC is not a stand-alone pre-processor but instead is coupled with a FLAC encoder and, hence, directly creates FLAC compatible compressed files. Second, FSLAC only alters the high-bit-rate audio segments, not (almost) all parts of the audio input as LossyWAV does. The coded audio, therefore, remains perceptually lossless. In addition, it is worth noting that, due to its simplicity, FSLAC encoding is very fast. All of these features make FSLAC attractive for audio production and archival applications.

   You can download an executable of a FLAC encoder with added FSLAC functionality below (Win32 release, compiled with Visual Studio 2008). The FSLAC.EXE can be used on the Windows command-line or, when renaming it to FLAC.EXE, in link to external web page foobar2000. Note, however, the list of link to sub-page known issues with this executable. If you are interested in the specifics of the FSLAC algorithm or would like to compile your own binary (kindly let me know if you did!), you can also download the source code file that I had to modify. All FSLAC related code is encapsulated in the define ECODIS_CVBR_MODE (line 72).

FSLAC logolink to document Download FSLAC 1.3.2 (Build 12-2016)
Windows (32-bit), compiled with Visual Studio

link to document Download FSLAC 1.3.2 source code file
replaces the stream_encoder.c in FLAC source

To convert your lossless files using FSLAC in foobar2000, select the tracks to convert in the playlist, then right-click on a marked track and select «Convert», «Quick Convert». In the appearing dialog box, select «FLAC <N/A>», edit the compression level if desired (the lower the level, the lower the maximum FSLAC bit-rate—level 8 limits around 1000 kbit/s) and, after clicking on «Convert», locate the downloaded F(S)LAC.EXE if required.

Figure 4. Influence
of the FSLAC comp-
ression level on the
output bit-rate when
coding the link to external web page SQAM
inputs with the high-
est lossless bit-rate.

Lossless VBR input:
FLAC -8, trans-
coded CVBR output:
FSLAC -6,  
FSLAC -4, and  
FSLAC -3. Note how
the high-rate signals
2, 3, 10, 13, and 16
are affected most by
 the bit-rate limiting.
FSLAC on highest-rate SQAM signals (click to view as PDF)  

OPUS: The Best Royalty-Free Open-Source Speech and Audio Codec

   All lossy perceptual audio codecs discussed above are royalty-bearing, meaning that, to use these codecs, you (or your software provider, e.g., Nullsoft/AOL) need to pay the companies owning the patents on the applied coding techniques. The OPUS speech and audio codec, as specified in 2012 by the Internet Engineering Task Force in link to external web page RfC 6716, is an open-source codec with surprisingly good subjective performance which, like FLAC (see link to previous section above), does not make use of any patented third-party technology. Thus, OPUS can be downloaded, compiled, integrated, and used free of charge. The encoder is still being improved for better quality and stability, as can be seen link to external web page here.

   Since version link to external web page 1.1 from 2013, OPUS yields encodings of very high quality, except at very low bit-rates (see Fig. 3 link to previous section above). The last version is link to external web page 1.3 from 2018, featuring a robust speech/music classifier and slightly better coding quality especially at low rates. I beta-tested version 1.3 in June 2018 and reported a coding bandwidth detection and a speech/music classification problem, both of which were corrected in the final release. Being a low-delay codec with a speech core (SILK), OPUS also competes with link to previous section EVS.

OPUS logo
link to external web page OPUS download page
link to external web page Mozilla's builds
offering libraries, executables, and source code

   Fig. 5 illustrates the growing popularity of the OPUS codec among experienced users (not audiophiles!), in this case the link to external web page HydrogenAudio community. Based on this analysis, one can expect OPUS to «catch up» with HE-AAC in market share around the year 2020. Note, however, that by then, the essential HE-AAC patents will have expired (the PNS, SBR, and PS patents are the last ones running). This will allow implementing a fully licensing-free AAC Low Complexity codec, as is already the case with MP3.

Figure 5. Percentages
of the most commonly
used lossy codecs as a
function of time among
all link to external web page HydrogenAudio
forum members voting
in the annual poll. The
OPUS codec gains
usage share, while the
HE-AAC standard
stagnates at a quarter
of the share and older
codecs like Vorbis
and MP3 are slow-
ly being replaced. The
results for 2018—2019
are extrapolated from
  the previous 7 years.
Lossy codec usage among HA members (click to view as PDF)

   For all historians: some early documentation of OPUS's transform coding core, called Constrained Energy Lapped Transform (CELT), is archived link to external web page here (papers from 2009). A further subjective comparison of OPUS and EVS on music content is available link to external web page here.

Summary: Audio Coding has Matured, Future: More Post-Processing
May 2018

   My experience in audio coding research and development leads me to conclude that the performance of the last-generation audio codecs, as indicated link to previous section above and in my dissertation [link to sub-page Helmrich, 2017], has matured to the point where the small perceptual benefits of more elaborate signal processing are not worth the increase in algorithmic complexity. Moreover, I believe this is the case for both the decoder specifications and encoder implementations. At least for mono, stereo, and 5.1 surround, the inaccuracies of human hearing seem largely exploited (by irrelevancy removal and parametric coding), and the remaining statistical redundancies within the coded data have been reduced to a minimum (except maybe in some signaled parameters, see sec. 3.6.2 of my doctorate thesis). The only thing left to do, in my view, is to make greater use of post-processors similar to the in-loop filters in link to sub-page video coding. Such techniques can improve the reconstruction quality of very tonal or transient signals, as has been demonstrated by the link to external web page LTP post-filter in EVS and MPEG-H Audio as well as an link to external web page applause post-filter. Update Nov. 2018 Neural-network based post-filtering for blind bandwidth extension of speech at low bit-rates also seems a promising topic for further research, see link to external web page here.

Audio Coding Demos at 32 kbit/s Stereo for Comparative Listening
June 2018

   The following offers a demonstration of the low-bit-rate performance of the presented audio codecs when applied to a 16-second 2-channel music signal losslessly extracted from an audio CD *. Click the play/pause () buttons below to start and stop the play­back of each demo. For each coding format, I tried to select from several encoders and configurations the one yielding the best audio quality at the target bit-rate of 32 kbit/s. You can download the encoded files (bit-streams), along with a brief description of the encoder settings used and alternative encoders not considered for this demo, link to document here.

Lossless signal sampled at 44.1 kHz (862 kbit/s)
Pet Shop Boys - Left To My Own Devices, CD rip *

G.721 coded mono downmix at 8 kHz (32 kbit/s)
4bit ADPCM, FFmpeg, ITU-T, standardized in 1984

MP3 coded signal sampled at 16 kHz (32 kbit/s)
Audition 3.0 encoder, MPEG, standardized in 1993

Ogg-Vorbis coded signal at 24 kHz (32 kbit/s)
aoTuV encoder,, format specified in 2000

HE-AAC/MP4 coded signal at 44.1 kHz (32 kbit/s)
Winamp 5.6 encoder, MPEG, standardized in 2003

OPUS coded signal sampled at 48 kHz (32 kbit/s)
release 1.3, / IETF, standardized in 2012

xHE-AAC/MP4 coded signal at 44 kHz (32 kbit/s)
modified JAME coder, MPEG, standardized in 2012

* digitally extracted excerpt of track 11 of the «Discography – The Complete Singles Collection» audio CD by the Pet Shop Boys from 1991, used under the U.S. link to external web page Fair Use doctrine for nonprofit educational purpose. The decoded waveforms were transcoded to high-bit-rate MP3 for best playback compatibility.

   Notice how at this bit-rate, the audio coding quality consistently improved, starting at the first codec from 1984, which could only encode a narrowband mono downmix at 32 kbit/s, over MP3 from 1993, enabling a wideband stereo encoding, up to HE-AAC from 2003, providing superwideband stereo sound for the first time at such low rates. Notice also how, with the codecs developed after HE-AAC, the reconstruction quality increased only marginally and only in the second half of the signal containing the Disco beat. In fact, most of the efficiency gains of the recent codecs are reached on speech signals:

Speech Coding Demos at 16 kbit/s Mono
Oct 2018

   The following is a demonstration of the most widely used speech coding standards on 16 seconds of one-channel speech signals losslessly extracted from the link to external web page SQAM CD.

Lossless signal sampled at 44.1 kHz (313 kbit/s)
German speech, die Natur hat (deutsch), CD rip *

G.721 coded signal sampled at 8 kHz (32 kbit/s)
4bit ADPCM, FFmpeg, ITU-T, standardized in 1984

MP3 coded signal sampled at 16 kHz (16 kbit/s)
Audition 3.0 encoder, MPEG, standardized in 1993

AMR-WB IO coded signal at 16 kHz (15.85 kbit/s)
EVS encoder, ITU-T G.722.2, standardized in 2002

HE-AAC/MP4 coded signal at 32 kHz (16.4 kbit/s)
Winamp 5.6 encoder, MPEG, standardized in 2003

OPUS coded signal sampled at 48 kHz (16 kbit/s)
release 1.3, / IETF, standardized in 2012

EVS coded signal sampled at 32 kHz (16.4 kbit/s)
release 14.2.0 (float), 3GPP, standardized in 2014

* digitally extracted excerpts of tracks link to document 53/54 of the EBU «Sound Quality Assessment Material» audio CD from 1988, utilized under the U.S. link to external web page Fair Use doctrine for nonprofit educational purpose. For play­back, the decoded waveforms were transcoded to high-bit-rate MP3 to maximize browser compatibility.

   Notice how the speech clarity increased and the coding noise decreased over time. By the way, if you are planning to perform a formal blind listening test, I suggest you take a look at this link to sub-page listener guideline which I wrote for the HydrogenAudio community. For all German speaking readers of this web site: eine deutsche Version gibt's link to sub-page hier.

page last modified in November 2018, last changes: finished OPUS 1.3 description, added speech demo

Copyright © 2008–2018 Christian Helmrich :: Privacy