3D Adventures in Binaural Audio: Sheron’s “The Late Great Bloomer”
Most albums don’t make demands. They politely request that they be played, and they very much hope you enjoy what you hear.
The Late Great Bloomer, released by the artist Sheron earlier this month, comes from that first camp. For his adventurous album recorded in binaural 3-D audio, the Brooklyn-based multi-instrumentalist demands – well, at least strongly suggests – that you listen on headphones or “capable speakers.”
In other words, please make earbuds or laptop speakers an absolute last resort, and if you do go that route, prepare not to get it.
From the Lab to Reality
A former employee of Princeton University’s 3D Audio and Applied Acoustics (3D3A) laboratory, Sheron a.k.a his civilian name Andrew Sheron took things a step beyond binaural audio with The Late Great Bloomer.
This is a full-length binaural concept LP, mixed and mastered using novel technology from 3D3A. That has helped Sheron to craft not just an immersive audio experience, but a transporting one. Along with hooking on to the songs themselves – an atmospheric, American crossroads of modern and vintage influences — it’s Sheron’s hope that the listener will feel some real magic – ideally hearing the record uninterrupted from end-to-end.
Recording with a Neumann KU-100 dummy head microphone (a.k.a Fritz) was just the start of Sheron’s investigation into 3D Audio technology. He began experimenting with different recording approaches across the country on a wide variety of instruments, bringing binaural microphones to churches, concert halls, cabins, barns and bedrooms.
Notably mixed in multitrack, the binaural Bloomer offers up a different kind of audio immersion, allowing Sheron to tell the album’s story organically. “I want to place the listener into the world of the songs,” he explains. “At times you might hear something small whizz past your head, a distant horn section, a whisper in your ear, or feel surrounded by a choir.”
It’s an album that stands up to repeated listenings, revealing new sonic details to the listener each time. But as you’re about to learn in this SonicScoop interview with Sheron, taking on new recording techniques and then pushing them to outer limits is very hard work – a task packed with pitfalls but plenty of rewards.
(listen on Spotify’s highest possible resolution settings for best results. High quality downloads are at http://www.hellosheron.com.)
Andrew, in your own experience, how would you characterize the shortcomings of listening to music recorded using conventional methods? Why was it so important for you to transcend these limitations for the production and eventual playback of your album?
Great question. I don’t think of conventional recording methods as limited or having a lot of shortcomings; some of my favorite records are mono, and sound incredible out of a three-inch cardboard speaker.
What I was reaching for when working on The Late Great Bloomer was realism – an imitation of the way you’d hear the songs if they were being performed live, just for you, in a high-ceilinged room with dozens of musicians and singers surrounding you – and telling a more compelling story through an experience more immersive than what you would hear through traditional loudspeakers.
You talk about going down the “rabbit hole” of 3D audio technology. Can you be more specific about what that rabbit hole entails: What kind of journey does a person go on when they commit to recording, mixing, and mastering in unconventional audio techniques?
I worked in traditional stereo for quite a few years before even hearing about binaural, so when I came across a Neumann KU100 for the first time at my producer Tommy Jordan’s house it was a real ear-opener.
I started searching for any and all binaural albums I could find – only to feel a little disappointed because it seemed like most audio engineers either thought binaural microphones were a gimmick, or came from the minimalist, purist side of recording practices and used one dummy head to capture a one-take live performance in a room — Chesky Records have made a lot of beautiful recordings like this.
Listening to examples of the latter was enjoyable and interesting to me, because mixing happens in a similar manner to some of my favorite early recordings or one-mic bluegrass albums: The musicians mix themselves acoustically or are moved around in relation to the dummy head. What you hear is what you get.
That method was, and is, really compelling to me, but I couldn’t help but wonder what a binaural record could be like if it were approached with the same sort of in-studio creativity that so many of my favorite albums were born out of.
So I set out to make a multi-tracked binaural record. But I soon learned that as soon as you start mixing binaural signals together you start having problems.
We’ll get to those difficulties in a minute – first, there’s another important technical detail to go over. You mentioned “crosstalk-canceled loudspeakers” more than once to me. Can you please explain what those are, and why it was important for you to master their use?
At this point it might be helpful to define what binaural audio is for those that might be unfamiliar. The word “binaural” literally means “using both ears,” so the simplest definition of binaural recording is that it tries to capture the complex way in which your ears hear sound in the real world. We can boil down that complexity to three sets of “cues”, or bits of information:
Interaural time differences (ITD) – supposing a sound source is located on your left side, it will reach your left ear before your right.
Interaural level differences (ILD) – that same sound coming from the left will also be louder in the left ear than in your right
Spectral cues – The shape of your ears, where they are placed on your head, and other characteristics that distinguish people from each other make sound bounce off of and wrap around everyone differently. Your brain has gotten used to how your physiology sounds, and these are what we call “spectral cues”
All of this information put together is called an HRTF (head-related transfer function) and is analogous to your fingerprints. Your HRTF is unique to you.
There are a lot of ways to end up with binaural audio. The most straight-forward is using a dummy head like the Neumann KU100 – a stereo microphone with one capsule in each ear. When you play back a recording made with this microphone over headphones, you hear what the dummy head’s ears heard and it blows stereo out of the water when it comes to realism. At times, you can really feel as though someone were whispering in your ear, or a fly is buzzing around your head, or the distance between you and a far-off trumpet.
But I should mention that there is a problem here – the dummy head’s HRTF is not the same as your HRTF. This mismatch is the main reason that some people have a more realistic experience listening to recordings made with that particular microphone. The more similar your head and ears are to the dummy’s, the more accurate the binaural cues will be — more on that later.
So now that we have some binaural audio, let’s try to play it through some traditional stereo loudspeakers – one on the left and one on the right. Suddenly the sound isn’t as 3D anymore. That sensation of closeness when you heard the whisper in your ear has jumped back into the speaker boxes and away from your head.
What has happened is the result of what we call “cross-talk”. Ideally, what we want is to play back the dummy head’s left channel (left speaker) directly into your left ear (and the same for right to right ear), but instead the left speaker is pumping out sound in all directions, and those sound waves are making their way to your right ear as well as your left.
Crosstalk-cancelled loudspeakers create an invisible wall between the two channels by using interference patterns, so that the left speaker is heard significantly stronger in the left ear than the right and the right speaker is heard significantly stronger in the right ear than the left. Now the binaural cues are preserved just as in headphones, and you can hear the sound in 3D, but through loudspeakers and in the air, which is a markedly different experience to the ear and brain.
[Editor’s note: See 3D3A’a in-depth explanation of crosstalk cancellation here.)
Let’s focus on the recording phase of the album: Were there special requirements for where you had to track the performances?
Once I decided I was going to make a multi-tracked binaural album, I was free to place the dummy head microphone in any number of creative ways when recording the basics without being tied to the one-mic, one-take philosophy.
So I took the Neumann KU100 (nicknamed “Fritz”) to record with me all around the country: organ in a church in Connecticut, in a concert hall in California, a couple cathedrals in Manhattan, and I put him under a piano, hanging over a drum set, upside down in a kick drum, beneath some bedsheets…I focused on getting really interesting source material at separate times, which is impossible if you’re tracking all the musicians live in a room to one stereo track.
You discovered there were many challenges of mixing in binaural: What was unique about the way that your album was recorded that introduced these challenges, since many binaural albums are “mixed” live via the placement of the instruments relative to the microphone? And then in the actual mix phase itself, what were some of those challenges that you had to overcome?
Once I had this wealth of source material I now had to find a way to mix it all together into what felt like a cohesive, one-take live performance much like those Chesky records I mentioned before.
The main issue I came across is that your brain has lots of trouble understanding all of the binaural cues when you stack them on top of each other. Here’s an example: for one song I recorded a female choir in a church in Manhattan – a huge space with a long reverb tail, and the lead vocal for the song was recorded while I was mixing the record in the live room of Art Farm studios in upstate NY (not a very long reverb tail).
These two tracks sound beautiful separately, but when I mixed the two together they sounded significantly worse in terms of realism. It still sounded like a good stereo recording, but much of that sense of space is lost in translation. The feel was totally different.
It was really frustrating, and I couldn’t figure out how to fix it until I met Dr. Edgar Choueiri, a Princeton professor of electric propulsion for spacecraft with the heart of an audiophile who developed a new way to tackle the cross-talk cancellation problem in loudspeakers without coloring the sound.
I worked at his laboratory for a while and he loaned me some equipment I took to the mix session in order to create a cross-talk cancellation filter for my mixing setup in the barn at Art Farm. This setup allowed me to play back some of the non-traditional binaural source material I had collected over the crosstalk-cancelled loudspeakers and re-record them with the dummy head in this new space.
This way, the original character of the recording comes through, but it sounds like it exists in the same room as everything else, and then mixing everything together becomes much easier.
This is just one example – there was also the problem of proximity, the wet/dry ratio with distance, re-amping the rare mono tracks that I had recorded and couldn’t part with, and other choices all aiming to make the final mix sound like all of the tracks were playing back simultaneously all around the room.
Did the fact that you were solving many of these problems as you moved along require you to ask performers to come back to the studio to record sessions? What special preparation is required for musicians who are performing on a record like this — do they need to be coached in any way to do things differently?
I played most of the instruments myself, so if there was something I wanted to re-record while up in the barn, I just did it, and any mono recordings I had done of the other musicians (like Alec Spiegelman’s woodwinds) I re-amped in the room and played something similar like trumpet alongside them live to try to mask the fact that speakers were being used.
Unlike the horn or string sections sometimes I didn’t want the signals to be fixed in space, so I would dynamically move the speakers around the room, which added another element of performance to the album.
Was there a learning curve for mastering the album? Who mastered it, and what was unique about the mastering process?
Not really – Joe LaPorta over at Sterling Sound mastered the record and I attended the session in order to create a crosstalk-cancellation filter for his mastering setup. It was really rewarding to see him start and look over his shoulder when the intro of the record made him think something was flitting past him.
Once the album actually gets to the listener, what is the preferred method of playback for them?
Crosstalk-cancelled loudspeakers. But since those are still rare at the moment, the next best option would be some open-back headphones in a quiet environment.
What will their experience be if they listen in lower-grade playback systems, such as everyday earbuds?
Anything will work. The quieter the environment the better because if your brain is hearing the world around you and the world I’m trying to take you to at the same time, it won’t be as effective.
And remember, this was all recorded with a dummy head microphone with a mismatched HRTF, and although it’s a big step towards realism away from flat stereo, it’s far from a perfect technique. It’s likely that you’ll hear something in front of you that was recorded behind you, or not really hear the difference in the elevation of sources, or externalize some sounds better than others.
I think of it like looking at a completely new world – colorful, energetic, teeming with life – through a dusty window. So use your imagination! Listen closely and try to point out where everything is in the room and search for the many little details all around you.
Now that you’ve completed The Late Great Bloomer, where do you see binaural and VR audio headed?
We’re right on the cusp of a revolution in audio realism, but there’s a big problem with VR developers cutting corners on audio in favor of better visuals. And that makes sense – people see the difference between 4K and 240p in video a lot more consistently than hearing the difference between a 96kHz WAV and a 160kbps OGG (Spotify’s default).
But this isn’t a question of kbps – most people don’t even realize that 3D audio exists. And after all, you can take something that looks terrible, like The Blair Witch Project, and add great sounding audio, and end up with a pretty compelling movie, so I think the immersive qualities of VR experiences could be massively improved – practically overnight – with just a little attention to the audio and providing a little education to the VR consumer.
Ambisonics recordings are one really exciting domain – instead of recording through a dummy head, we can use an array of capsules to measure the air pressure in the entire field of sound in the space surrounding the array. Once we have a mathematical representation of the sound field, we can erase the array of microphones from the image and then superimpose a different object – say a model of your head with your HRTF.
Then, we can generate stereo binaural audio from that model, and if you wear headphones with a tracker on them, we can record which way you’re facing and rotate the sound field to match your view so you can choose to look in any direction after the recording has already been made. This makes a big difference with headphones externalization – or really hearing the sound in the space outside of the headphones, which is another downside of a static binaural audio track. A great example would be a VR experience in which you’re sitting in a chair at a concert, and when the horns in the eaves come in, you can turn around and look up at them and the playback will reflect your rotation.
A friend of mine, Joseph Tylka, one of the graduate student researchers at the 3D3A laboratory, is working on sound field navigation, which would have a big impact on VR audio. Imagine sitting in that same chair at the concert, and now being able to not only rotate and tilt your head, but, very naturally, lean towards the person sitting next to you, who is whispering something quietly, and really feel them get closer to your ears, not just louder, and not just more to one ear, but interacting with your HRTF in a very specific way and fooling your brain into thinking they’re right next to your head. Applications for horror games, stealth games, and live event broadcast all come to mind right away.
Finally, what would your advice be to someone who wanted to embark on a recording project that was similarly ambitious, either in the field of 3-D audio or another technique?
It’s been said hundreds of times in better ways than I can say it, but don’t be deterred by people telling you your idea is too ambitious or impractical or impossible: Just keep doing what interests and inspires you.
— David Weiss
Please note: When you buy products through links on this page, we may earn an affiliate commission.