Sep 6

Sep 6 Understanding High Resolution Audio

I read a lot about music and music reproduction online, so I know this is something people are often interested in knowing more about: what is high resolution audio, and what am I supposed to hear?

Example: I came across a thread where one stated the CD quality is 16 bit, 44.1khz, and this is enough quality because human hearing caps at around 20kHz. (They assumed that 44.1kHz was the frequency limit of CDs!) Of all the replies, no one corrected this misunderstanding.

I can hopefully demystify this topic and share some of my own personal experiences. I think CD quality is pretty good, and it would be my suggestion to most readers to find satisfaction with this much bandwidth when listening to digital audio.

Let’s Start with Records

I’d ask everyone to think about a record. A needle follows a path on a record, representing an impression of what a microphone had picked up. This vibration is then amplified, and we can get to hear back what was originally put into the grooves. I know this is a simplistic view, but it’s important to understand first how sound is captured.

Much debate around digital often compares digital sources to analog sources. I'm not here to debate records over CDs. I live in a completely digital audio world and am happy to do so. But some of the thinking behind digital, I think, is founded in how analog reproduction could offer a sonic benefit over digital.

A microphone captures sound vibrations which are then translated into electrical energy. These are analog processes, unless those signals are then coverted at some point to digital representations. I grew up with my parents owning one of the Edison cylinders. His early recordings using tin foil became a method to capture the vibrations of his voice into a medium that could then be played back.

The perceived benefit to this method of capture is that it is continuous. Why? Because perceive the world in a continuous (linear) fashion, one that is not somehow converted to numbers.

But if you went to a movie theater where you watched a flick presented on film, you weren’t seeing the film continuously. It was broken up; a series of still images played back so fast that your brain accepts that as continuous. Most video today is presented at 30 frames per second. This is a rate; your phone today can likely capture up to 120 frames per second (fps) for later presenting the video as slow motion. Some like the flicker of video presented at 24 fps, to emulate film.

This is how PCM digital audio works. Audio is captured at a specific rate. In CD recordings, this rate describe how many “frames per second” are used. “CD quality” captures audio samples at 44,100 samples per second, or kHz.

Digital PCM

So what is “16 bit?” Since digital audio is about numbers, this is the amount of variation in what is captured at each cycle. When we say 16-bit, it means the system can store 65,536 possible values for each sample of the audio wave. That sounds like a staircase, but here’s the key:

Dynamic range: 16-bit allows about 96 dB of range between the quietest and loudest sounds it can represent. That’s enough to capture everything from a whisper in a quiet room to the roar of an orchestra without clipping or dropping into total silence.
Noise floor: Instead of hearing “steps,” what actually happens is that the little gaps between those values show up as a tiny amount of background hiss (quantization noise). But even with 16-bit, that noise is far quieter than what you’d notice in most listening rooms.
Analogy: Imagine a dimmer switch with 65,536 smooth positions between completely off and full brightness. Your eye won’t see the steps — it looks like a smooth fade.

So, 16-bit doesn’t mean you hear “chunks” of volume. It means the recording has a maximum range of about 96 dB to work with, which is already more than most real-world recordings use. (The CD spec, if I remember, was almost capped at 14-bit, but 16 won out.)

What’s more useful to us, perhaps, is understanding what the 44,100 samples give us at that 16-bit sample size. The sample rate equates to reproducing a range of frequencies that extend to 22.05kHz, at least theoretically (a very high frequency, think of something high beyond birdsong and beyond the range of a violin on the E string, that if present in your recording, is likely upper harmonics of pitches far lower that we can easily hear). The bit depth equates to 96 dB of dynamic range: the amount of levels between dead quiet and full volume.

Theoretically, these standards are supposed to represent all the frequencies the human ear can hear. And 96dB of range is not far from the theoretical limit of 120dB that we can detect. Philips and Sony came together to figure they could fit that much data onto the size of a CD, encoding enough music up to 74 minutes in length (later changed to 80 minutes), enough, supposedly, to present Beethoven’s 9th symphony.

So what’s high-resolution?

We'll define high resolution as any digital file that exceeds the CD standard. It may be helpful to think here of the resolution of a JPEG in pixels per inch. We know from practice that higher ppi images (usually expressed in megapixels) is a good thing. (To be exacting, maybe we should equate CD audio to a TIFF file, rather than the JPEG, which is compressed in a way not dissimilar to MP3 files.)

Before these files became available to us, there were efforts to improve the sound of CDs by using higher resolution equipment. You may have CDs that report using a 20-bit recording technology, or then 24-bit. These higher resolution recordings didn’t change what your CD player could do; they got “cropped” before being re-interpreted as 16-bit recordings. These could sound better, depending on what was thrown away, preserving more detail than what you got before.

Then like everything else, technology improved and studios upgraded their equipment. 24-bit, 96kHz was popular, it seemed, and some went up to 192kHz sample rates. Just like increasing frames per second in video, this increase means that there are fewer gaps in the sampling of the audio. That said, reconstruction filters used in DACs don’t leave holes (silence gaps) in the music. What comes out of the analog outputs of your streamer or CD player is continuous too. But the promise of more samples would seem to indicate something superior.

That said, I can say with no certainty that you can hear the difference between 192,000 samples versus 44,100 samples. But psychologically, you want more, don't you? Why not. More is better. We have come to understand that about the world. I came across a page from a music dealer recently that said, without question, that these higher resolution files provide more detail, offering us the ability to hear things in the music that regular CDs resolution cannot. Be careful at believing such things.

As an example, Apple increased the screen refresh rate on their iPad Pro product several years ago, and people reported that “scrolling felt so much smoother.” This was easily perceptible by many users. It offered a benefit, although it's one you might not have wanted to pay for in adopting a "pro" iPad. And if they increase the rate next year to something like 250 fps? Will you be able to detect that?

I make this comparison because I feel the refresh rate of an iPad Pro and CD audio are soemthing we can compare as "good enough." High-resolution digital audio seems to be in that next level, at the extreme.

“Super” Audio

The first commercial availability of high-resolution for music came via the SuperAudioCD. It's a format that never really took-off, although some would say that the format offers a superior advantage in terms of sound quality, because of how the encoded format works.

Sony and Philips explored this technology as an alternative to PCM. Today we call this DSD, and the original standard, DSD64, uses a higher sample rate (i.e. more data) at 2.8MHz, but instead of using a big word at 16 bits, it uses a 1 bit word, only capturing the change from one sample to the next. This page describes the differences between the two technologies better than I can. The encoding/decoding format of DSD more closely resembles the way analog data is captured and reproduced, and for some, that similiarity means superiority. We're back to comparing vinyl with CDs.

The article linked above concludes like this, leaving us to a weird place: “Ultimately, the choice between DSD and PCM comes down to personal preference and use case. If you are an audiophile with high-end equipment and prioritize sound quality above all else, DSD may offer the most satisfying listening experience.”

It's important to note that DSD files (DSF) and PCM files (WAV or AIFF) are like graphics files. You can open and save these (with conversion) into one format or the other. Which means if one format offers a superior experience over the other? You might not want to convert them.

Many digital recordings are made in PCM. You'll know this because of the rate used, if it's documented in the booklet (i.e. 96kHz, 24-bit recording). PCM is far easier to edit and master.

However, some recording engineers will record direct to DSD as their master. They can then release this in DSD, or release it in PCM. Some will convert the DSD recording to PCM for editing, then output it again into DSD. I can't speak to the sonic benefits or losses doing this conversion, but if I wanted to try and experience a difference, I'd seek a recording that only lived in the DSD domain.

You may also come across vintage recordings made into DSD or high-res PCM. This is where they digitize the original analog recording, attempting to capture the full bandwidth of the master tape. Does the master tape capture more dynamic range or more frequencies? Probably not. Therefore the benefits are likely not perceivable.

We also need to be aware of folks who are just up-sampling CD quality audio to high resolution formats. In Photoshop, I can change the resolution of an image, but the visual benefit is moot. Unless I'm using a plugin that attempts to use machine learning (AI) to fill in details that are missing, when I zoom in, I'll see the same jaggies. The result is a bigger file that offers no visual benefit.

But the other bit about the quote above? “The choice comes down to personal preference.” But you’re likely reading this because you are not yet sure what your preference is. Right?

Audio Compression

A discussion about digital music reproduction can’t be complete without talking about the rise of the MP3 player and Napster. People moved to digital music, on their computer, and they wanted to download and carry this music with them. Forget the Discman! So methods were designed to compress those CD-quality files by using psycho-acoustic models to throw away (i.e. lossy) data that we really couldn’t hear.

Except that ripping a CD results in about 650MB of data. And at the time, with modems used to access the Internet, we wanted smaller files. The more you compressed your audio, the more songs you could fit onto your MP3 player.

MP3 (and later MP4 variations) use a number to describe how much compression is applied. A 320kbps MP3 file is the highest quality, throwing less of the original data away. A lot of music people wanted for portable players was compressed at 128kbps, which was “good enough” for portable headphones, but it can sound “grainy” for playback on higher-end systems, as this rate more aggressively discards data. 256kbps was chosen at compromise, for a higher-resolution reproduction, again, with bigger files, but it was “good enough” for most people’s hearing. Learn more about MP3 files here.

There are people who claim they can’t hear a difference between compressed digital audio (such as an MP3) and a full-resolution CD file. (I will admit, it can be difficult, especially so with 256kbps and higher files.)

There are people who claim they cannot hear the difference between CD files and high-resolution files. Or PCM and DSD files.

So, therein lies the paradox: We have the technology today to take 100 megapixel images. And if we're making a large banner, we can see the benefit of this technology. But in audio, it's less clear if we are getting sonic benefits from using high resolution audio, let alone hearing the benefits from files that are technically below CD-quality (i.e. 256kbps MP3s).

Dynamic Range

At some point, discussion of these file types becomes moot when we realize that not all audio equipment (and the rooms they are in) can technically let us hear differences. For instance, an inexpensive speaker (or headphone) might not be able to really well-produce a full range of dynamics. High resolution files give us theoretical capacity to go beyond the CD 96dB limit. The human capacity to detect differences in dynamics is around 120-140 decibels.

What is this? Picture yourself in the auditorium of a concert hall. The orchestra is all playing, full-tilt, and the bass drum is getting hammered. It’s loud! Then everything dies down, and you can just detect a whisper. This is a huge range in dynamics (let’s call it volume).

You can imagine a high-fidelity system should match the dynamic range of human hearing. A 120dB capable system would sound more natural than a compressed system, only able to deliver maybe 90dB of range. 90 isn’t awful, but it is artificial. (In actual practice, home listening situations are littered with up to 30dB of background noise which interferes with these theoretical extremes.)

And this is where some of the appeal for high resolution files, when used on highly-revealing systems, becomes less about what we can certify we hear and what we cannot. Philosophically, we may want to benefit from more data and more dynamic range. But can our equipment match what's capable through the encoding with these files?

Acoustic Energy

One argument is that frequencies present in making music—which can be captured in excess of what our ears can hear—somehow affect the frequencies we can hear. This is likely the effect of the upper harmonics of music that we know color the timbre of the sounds we hear instruments and voices making.

Of course, animals hearing range in some cases exceed our own. But there is some science to suggest that frequencies beyond what we can detect in terms of pitch can affect us.

Of course, if you want to experiment with this, you need a digital file that includes frequencies beyond the 20Hz-20kHz range, and you need a system, including speakers, that can also reproduce those frequencies. Many components filter off ranges outside the limits of our hearing, so even if the digital file contains ultrasonic data, it’s likely getting stripped before it hits our loudspeakers.

Many systems don’t go as far down as 20Hz. Some people can’t hear that low (although they can feel it). I know my own hearing, I can’t hear beyond 17kHz. (I need to test myself again, it’s been some time, and even though I thought I could hear 17kHz, it was very rolled-off, meaning, soft.)

I have no authority to say if high-resolution files, matched with a high-fidelity system, can deliver more enjoyment, with a wider frequency range.

I can say, however, that with high-fidelity equipment, the higher dynamic range is palpable and enjoyable. I know this after swapping out one piece of equipment with a limited dynamic range with one that offers higher dynamic range. And I preferred the wider range.

What I Can Hear

In my system, I want to discuss what I can hear and what I cannot hear. Because we are not machines, making these determinations with any kind of certainty is a complex process, best done with multiple tests and under different conditions. A “double blind” testing situation is what you may ultimately want to conduct to be certain.

A double-blind test is a study where neither the participants nor the researchers know who is receiving the actual treatment or a placebo. You can imagine, you need a technically-minded accomplice to run these tests at home.

High-bit vs. Low-bit MP3

I think I am like most people who can very easily hear the difference between music compressed, to say, 8-bit from 16-bit; or an MP3 at 96kbps versus 256kbps. I’d describe the lower-resolution file as sounding grainy with less dynamic range.

You can do this with audio software on your computer such as Adobe Audition or the free Audacity, making different files and playing them back.

Using Audacity

Before you start: make sure your source is lossless (WAV/AIFF/FLAC/ALAC). Don’t transcode MP3 → MP3.

Install and open Audacity (free).
Drag your lossless track(s) into Audacity.
Go to File → Export → Export as MP3.
Click Options… and set:
- Bit Rate Mode: Constant (CBR)
- Quality (kbps): pick a low rate to reveal artifacts (e.g., 96 kbps). (You can also try 128 kbps and 192 kbps for comparison.)
- Channel Mode: Joint Stereo
Choose a filename (e.g., Track_96kbps.mp3) and click Save → OK.
Repeat with other bitrates (e.g., 128, 192, 320 kbps) so you can compare.

Bonus (Audacity “null” test to hear what was removed)

Import both the original lossless file and your new MP3 into the same project.
Align them so they start together (they usually do).
Select the original track, then Effect → Invert.
Press Play. What you hear is (mostly) the difference introduced by MP3 compression (discarded information + encoder noise).

This is a great place to start because you will most likely hear a difference, showcasing the effect of compression algorithms on source material.

Cassette Tape vs. CD

I grew up in the era of records, 8-tracks, cassettes, CDs, and now MP3s. Cassette tape technology is an analog medium wherein audio is encoded magnetically on tape. The dynamic range of cassette tech is around 50-75dB, depending upon the quality of the deck and tape used. (Reel to reel tape can take the dynamic range up to 80+dB on the best equipment.)

It’s been some years, but yes, I could hear the difference between cassettes and CDs. CD audio was cleaner, with more range (bass and treble). Again, an easy comparison for many.

DSD vs. PCM

This one is difficult to even setup properly, given that all DACs do not playback DSD. The easiest method might be two take a CD player and a SuperAudio CD player and connect them to the same system. Of course, the DAC chips would be different, so that’s a dicey proposition. If you had a player that could be switched to read the CD layer of the disc, then the DSD layer, you might go about it that way.

Let's get technical:

PCM (Pulse Code Modulation): Stores audio as a series of samples with a certain bit depth (number of levels, e.g. 16-bit, 24-bit) and sample rate (how many times per second, e.g. 44.1 kHz, 96 kHz). Think of it as “lots of finely measured snapshots of the wave.”
DSD (Direct Stream Digital): Stores audio as a very high-speed stream of 1-bit values (on/off). Instead of measuring exact levels, it records whether the signal is going up or down compared to the last moment. The standard DSD64 runs at 2.8224 MHz (64 × the CD rate).

I have a small collection of DSF files which I extracted from SuperAudioCDs with my Oppo Blu-ray player. These are DSD64 files that I then stream to my DAC as a DSD stream, the same as if a SuperAudioCD player were to stream the data to my DAC. This isn’t possible, however, unless you have one that can do that (i.e., via I²S interface). Because of copy protection, you can’t just hook up a SuperAudioCD player (like my Oppo) into a DAC. The method I used is a work-around.

Of course, you can purchase DSD files, in resolutions in excess of DSD64. I’d caution you to only purchase DSD files that were recorded in that format to test for the sonic benefits. Note, too, that some recordings are recorded in DSD, edited in PCM, then converted back. I’m not an expert to speak to the potential drawbacks of this approach. (Nor can I certify that the recordings I have in DSF format all were recorded in DSD.)

But—let's assume we have a good DSD file, and a PCM file to compare it to. Are there benefits to the DSD format? I think so. In many of my DSD recordings, I experienced what I’d describe as a “smoother” experience, one where things sounded more relaxed. It's an odd descriptor, I know, but I'm not alone in using it.

Using my PS Audio Direct Stream DAC, this experience was the most profound. (It may have something to do with the fact that the DAC itself is a delta-sigma-based design, and it converts all PCM content to DSD, before conversion.) In this case, this DAC does not have to convert the DSD file (although it does also upsample everything). This might suggest that the conversion to PCM offers a different experience than when the original content is already DSD.

I have since upgraded my system with the Grimm Audio MU2, and the effect I experienced is less nuanced using DSD files. It’s another proprietary DAC design, based on the same Delta-Sigma architecture used in DSD. I’d say that I can't detect a difference with this player. And clearly, when I have both boxes in the same system, I need to conduct some blind testing to answer this for myself.

(I’m birdwalking here, but I’d say that the Grimm player presents both file formats extremely well to my taste, while the PS Audio gave DSD files special attention.)

The differences, however, are likely explained by more than just the digital conversion process used in each machine. This illustrates a maxim often cited by audiophiles: pay less attention to the chip or technical aspects of a DAC, and instead, use your ears.

Things to uncover that I don't know:

Does DSD-recorded audio offer an advantage?
Can I reliably identify the DSD version compared to the high-res PCM version?
Are there tell-tale signals beyond “smoothness” between DSD and PCM, which is an admittedly somewhat murky descriptor?
Are these differences only perceptible on one model of DAC?

(I'm eyeing Podger's Just Biber CD, recorded in DSD, to downgraded to CD quality, perhaps purchased from NativeDSD.)

So comparing DSD to CD audio is one way to contextualize the big question: can we hear differences between high-resolution files and those at CD quality. Let’s look at next what you’re probably wondering about: 44.1kHz vs. something else in PCM, maybe 192 kHz files?

High-res files vs. CD-quality files

Thankfully with a streaming service like Qobuz, I have access to a lot of 88.1 and 192kHz digital files. I always check to see if the recording company tells me the resolution at which the recording was made (it’s important to not audition upsampled files for the purposes of hearing a difference, in my view).

The First Test

I can also easily switch between versions in many cases with Roon - between the high-resolution and CD quality one. One such example I thought I’d try—which could be definitive as a start—is the Glossa recording of Paolo Pandolfo’s recording of Telemann’s Fantasias for Viola da Gamba. I purchased this via iTunes music (variable-bit rate compressed MP4) versus the 88kHz, 24-bit file from Qobuz.

My immediate reaction was that no, I cannot hear a difference between these two renditions. I also have to acknowledge that I’m in my room where I can hear the washer going in the floor below, and behind me, my wine fridge is whirring. (For critical listening, I turn the fridge off, and yeah, I also don’t do my laundry. I’ll also add, I’m one of those freaks who swears my system sounds better in the dead of night.)

After auditioning the third track over and over, I’d convinced myself that one big flourish the performer makes, exciting the bottom register of his instrument, has more presence with the high-resolution file. While not a blind test, this is enough for me to think that hearing this difference, although subtle, is enough to justify using higher-resolution files. Ultimately, this test is not conclusive, as my one source file is lossy-compressed.

I also think a comparison like this might be better controlled using headphones. My own headphones reportedly exceed the frequency range of my loudspeakers (3Hz to 112,000Hz vs. 25Hz to 25kHz). Based on Stereophile’s analysis of the Grimm MU2, it does export ultrasonic frequencies with high-res files, with rolloff, so if there’s a benefit, it should be something that I’m open to detect.

The Second Test

For my second test, I used a different recording, comparing CD-quality to high-resolution. It’s a 2-track release of Max Richter’s Late and Soon, offered via Qobuz in 44.1kHz and 96kHz versions.

I used the headphone output of the Grimm MU2 and played the first minute of the second track, multiple times switching between formats. I did not have the benefit for now of a blind test. And knowing which format is which can play tricks upon us, with expectation bias.

I did think I heard more lower energy presence with the 96kHz file. The bass energy felt more present. The difference, if it existed, was subtle.

I honestly don’t think my short experiment would result in a crowning victory of one format over the other, but I hope at least explained how you might try the same kind of experience yourself.

Important variables:

Have an easy way to switch between two formats matched at the same level so that a friend can easily switch between tracks for you.
Be listening both for changes in dynamic range and sonic character. You likely won’t hear a difference in frequency extension.
See what your equipment can reproduce and output. Your setup may equalize the differences between comparable digital audio files without you knowing.

Conclusions

Not all digital audio is the same. The use of psychoacoustic-informed compression that is the basis for “portable” audio, such as MP3, can be heard at a certain point of compression by a lot of people. Lightly-compressed audio is harder to distinguish from the uncompressed original. This was not fully explored here, but has been explored recently by Marques Brownlee.

The CD standard of 16-bit, 44.1kHz PCM is a high quality format that in theory matches the limits of human hearing. Two aspects for consideration is the dynamic range of the recording (above or below 100dB) and the reproduction of frequencies.

High dynamic range is something that may be perceived as more natural to a listener, providing a wide experience between the softest and loudest sounds in a recording, at a given volume level.

Dynamic range in real-world situations is not as wide as measurement numbers might indicate, given environmental conditions.

While hearing frequencies beyond human hearing would seem to altogether eliminate discussion of their benefit in high-resolution formats, some evidence suggests that an extension into the higher frequencies can perhaps influence our perception.

Even though high resolution files may contain these ultrasonic frequencies, not all equipment is configured to reproduce it. Limitations at each stage of one’s system could limit reproduction of these frequencies.

Furthermore, digital audio can be captured and played back in more than one way. Delta-sigma conversion turns an analog signal into (or from) a very high-speed stream of low-bit values (often 1-bit), then uses filtering and noise-shaping to push errors out of the audible band so the result is a smooth, accurate waveform. This format provides challenges to recording engineers and sound editors, and requires playback equipment that can natively convert these files. Some listeners, including myself, can perceive a different quality to these files, at least on some equipment over others.

Further exploration for me to best understand this will be:

to compare higher-rate, native DSD files (i.e. DSD128 or DSD256) to CD-quality recordings
to compare higher-res PCM files (i.e. 192kHz) to CD-quality recordings
to compare low-compressed MP3 or MP4 files against CD-quality files

In all cases, I need to employ a double-blind method to come to conclusive results. My current experience would have me suggest to you that in most cases, for the systems that don’t use premium equipment, that an investment in higher-resolution formats is not worth the extra expense.

The most apt comparison here may be of viewing a 24 megapixel image on a 4K display versus a 100 megapixel image. Yes, the 100 megapixel image has far more data in it, but there are other factors that can contribute to the 24 megapixel image to being equally as good or better, given the limitation of the 4K monitor.

I love music.