ALAC (and FLAC) v. AAC

Part I: From AAC to ALAC

I am what you might call an early adopter of taking AAC seriously. My wife bought me an iPod in late 2007, and I have been ripping our CDs since. While I could hear a difference between either [a] 256 Kbps AAC (the so called iTunes+ format) or [b] 320 Kbps (the highest available in iTunes 10) and the original LPCM representation on a CD, there were two realities that influenced my adoption:

[1] We owned a lot of recordings that were insufficiently well recorded for the deterioration in quality to make much or any difference: The 1924 recording of Rachmaninov playing his second concerto is interesting, but the sound is bad.

[2] I do a lot of listening in sub-optimal situations, such as in rooms with a lot of background noise, softly in the background, while I am working around the house, in the car, and so on. If I want a purely high fidelity experience, I have a room for that purpose.

The Apple TV as a streaming device made it even more tempting to go to computer storage as intensively as possible. When I started moving to networked music, a 120GB drive in a computer was standard, and the disc size provided encouragement to keep things small. But the most recently purchased drive was a 2TB external drive that was about $250. A well filled CD is 0.0007 TB, so the idea of saving space had less to offer. Nonetheless, the other way to look at 2TB is that it will only hold at least 2850 CDs.

And so I began to look at ALAC. The promised savings of 50% on storage space while preserving the integrity of the original recording meant that more than 5000 CDs will fit into 2TB.

Now we're talking.

Part II: Audibility of artifacts in AAC

My wife has a grand piano, and that means both that we know what a real piano sounds like as well as correctly implying that we have a lot of recordings of piano music. I also like piano with a human voice .. maybe two voices, so we have quite a few recordings of songs for voice and piano.

Either "because of" or "in spite of" the relative simplicity of the sound of such music, AAC artifacts can be heard if one listens directly to the music without distraction. The artifacts are audible even on a modest hi-fi -- this is not a strange, golden-ears-only phenomenon.

Using 320 Kbps AAC, piano transients [example: hammer hits a string, and the key is released silencing the note] are sometimes followed by a faint buzzing sound, a bit like a loose or improperly seated string, and voices, primarily female voices, take on a slightly nasal or slightly cupped-hands-in-front-of-the-mouth sound. With voices, it helps to know what the singer's voice sounds like on the original recording because it is more of a coloration of the sound rather than a non-musical rattling as it is with the piano.

On much other music, such as symphonies, jazz ensembles, rock and roll, string quartet and choral music, I cannot hear a repeatable way that 320Kbps AAC obviously degrades the original. It is almost as though the more complex the sound is, the more the AAC algorithm can correctly recognize the less audible parts and leave the listener with a satisfactory original. Indeed, I have purchased a good bit of hard to (otherwise) find music directly from iTunes, and I have yet to hear anything that made me want to replace it with the original CD.

With ALAC, I wouldn't be degrading the sound at all -- simply storing it in less space. This was comforting.

Part III: Early steps in the ALAC direction

I began purchasing FLAC downloads from Hyperion Records about a year ago. The first purchase I made was Angela Hewitt's new recording of the Bach WTC on the Fazioli piano. I was surprised at how much smaller the FLAC was than the LPCM original. I took this recording, inflated it to its original state, and then ALAC-ed it, getting very nearly the same size files as a result.

The piano literature was the first to be re-ripped, and the results for all piano music were much the same. Just to be clear, which I refer to "piano music," I mean music for a solo acoustic piano, recorded naturally either in a studio or a standard performing venue. Our piano music spans compositions written in the years 1710 to 2010. The results are pretty much the same.

Part IV: Results

I have expressed the results in terms of file size rather than bit rate. The bit rate for LPCM data stored on CD (a.k.a., RBCD or Red Book CD) is 1411 Kbps (44.1 KHz sampling rate * 16 bits per sample * 2 channels), but looking at numbers like 765 forces one to think ... OK, that's a little bit more than 50% of the original file size.

To put percent of file size into perspective, iTunes+ is a constant 18.14% of the original file size, and the best AAC quality level available with iTunes compatibility is 22.68%.

Let's consider two different types of recordings: [a] classical piano recorded in a studio or quiet and empty hall, [b] jazz trios recorded live.

For this study, I chose modern recordings that most people would consider to be well done, with a mix of fast and slow pieces, venues and performers. In the case of the live jazz, short tracks featuring announcers or conversation were not included in the statistical calculations.

Recording source Mean size Median size Std Dev (*) Std Dev (**)
Acoustic piano: 38.5% 39.1% 5.41% 14.1%
Jazz trios, live: 61.8% 62.1% 5.33% 8.62%

(*) Standard deviation the way we usually mean it.

(**) Standard deviation expressed as a percentage of the mean.

These two types of music are presented first because they are at the ends of the range of file sizes resulting from ALAC: solo piano does reduce in size quite well, and live jazz packs down the least.

The following table presents some other types of music and the results.

Still working on this one....

Put table here.

Part V: Testing the extremes.

ALAC is a linear predictive method; which is to say it is making predictions of the next sample based on the values of the past few samples. The analysis is being conducted in the time domain against a time encoded signal rather than the frequency domain like AAC, MP3, ATRAC, and many others.

Given what we know about how ALAC works, we would guess that noise would compress very poorly because in perfect noise the value of the next sample cannot be usefully predicted from any information about the samples already seen: in other words, noise is random, and random sound is noise.

I tested -20dB white and pink noise signals, correlated and uncorrelated between the channels. ALAC is unaware of the correlation between channels (i.e., "mono" is treated as two channels of sound with no attempt to notice that the two channels are the same). A -20dB noise signal does exhibit a reduction in size to 82% of the original size which corresponds fairly well with the fact that a -20dB signal is essentially 3+ bits less than the full signal.

With this result in hand, I wondered if recordings made at too low a level would pack down rather more than ones that are closer to full signal. I found two examples: Robin Holloway's Gilded Goldbergs, Op 86 on Hyperion, and Yuri Temirkanov's traversal of Tchaikovsky's symphonies on RCA. Both are simply recorded at too low a level --- it is not a question of accommodating dynamic range, but a case of having the mastering level set too low.

Recording source Mean size Median size Std Dev (*) Std Dev (**)
Robin Holloway 31.4% 31.5% 2.40% 7.6%