Thomas Dolby: Challenging Audio to Take The Next Step at AES 2018

October 1st, 2018 by David Weiss

How do you escape your past? Keep on reinventing the future.

Professor Thomas Dolby knows this as well as anyone. His credentials today reveal a mind that’s been laser-focused — probably even obsessed — with what’s next for audio. Throughout his multi-decade career, Dolby has emerged as pioneer in all things digital, sonic and visual, pushing boundaries from Laserdiscs to video games, interactive TV, virtual reality installations and location-based entertainment.

Want metrics? Dolby’s startup Beatnik Inc is renowned for co-creating the code that enabled Java’s interactive audio, and when polyphonic ringtones showed up in mobile phones Dolby’s BAE technology was a certified hit: Beatnik licensed the technology big time, getting embedded in more than two billion cellular phones and devices.

Today, his station in life is more academic but no less exciting, holding the post of Homewood Professor of the Arts at Johns Hopkins University in Baltimore, MD since 2014. As you’ll see below, not all of his students know that his first groundbreaking musical actions were as the inventor of “She Blinded Me With Science,” a top 5 Billboard hit that was absolutely pivotal in breaking ‘80’s music wide open.

Today, Dolby — who remains active as a musician, film score composer, and producer — maintains an ambivalent relationship with that smash, self-describing as “a very introverted hermit with a very thin sliver of exhibitionism.” That thin sliver will be fully in the spotlight quite soon, however, when he delivers the Keynote Speech for the AES New York 2018 International Convention, coming up October 17-20 in New York City. In a talk entitled “The Conscious Sound Byte” happening during the Opening Ceremonies on Wednesday, October 17^th at 12:15 PM, Dolby’s address will focus on next-generation sound technologies, emphasizing adaptive/non-linear music and audio for games, VR/AR, “hearables” and other new media platforms. (Get a free Exhibits-Plus badge to hear the keynote and explore the floor at AES by visiting http://www.aesshow.com/aes18scoop and entering the code “AES18SCOOP”.)

In this conversation with SonicScoop, Dolby explains exactly what he means by “dumb” sound files and why his beef with them runs so deep, how being state-of-the-art can actually leave you looking (and sounding) like yesterday’s news, and much more. Here’s an audiohead you’ll want to get into, now…and again at AES 2018.

Thomas Dolby is on the attack at the 2018 AES Convention in New York City.

People who are plugged in know you as many things: hitmaker, musician, producer, entrepreneur, technologist. How do you introduce yourself to someone you just met at a party?

sponsored

You’ve had different kinds of hits in your career. How would you compare and contrast the feeling of a radio hit with that of having a technology hit — such as Beatnik had with its BAE polyphonic ringtone technology.

Well, it’s very different. My name and my face were very strongly associated with my music career, whereas a Beatnik was more of a corporation. It was more of a logo. It was more of a collective effort between 120 different people and our backers and our board and our partners and all the rest of it. So it was a lot less of an ego stroke in a way. But I did feel like a team player doing that because it was a triumph for all of us.

When I first had a commercial radio hit [with “She Blinded Me With Science”], although, I think it was an MTV hit first and foremost and then secondarily a club hit and thirdly a radio hit. So it sort of ambushed people. They couldn’t help knowing about me and my music. But at the same time it was uncomfortable for me socially because I went straight from being somebody that didn’t really get noticed very much, a bit of a wallflower, to being the center of attention, which didn’t sit with me very well.

So, I think it didn’t take me long to figure out that producing, co-writing songs, being a session keyboard player, being more of a backroom person was something I needed to hang onto, because I only had a limited appetite for stepping into the spotlight and being the star. I would do my thing a little bit, but then I needed to withdraw into my hermit crab shell for a while.

Keynote Kickoff

Let’s shift gears to the upcoming AES convention. I was pleasantly surprised to see that you’ll be delivering the keynote address at AES New York, 2018. How long have you been involved with the AES and what led you to being the keynote speaker this year?

I’ve been a few times over the years, on and off. I was there last year with some students from Johns Hopkins showing them around and I attended a few panels. And I was especially interested in the panels that dealt with spatialized audio and interactive adaptive music and games and VR and things like that. That was really the beginnings of my degree course, [Music for New Media], which has just kicked in this semester.

Dolby has held the post of Homewood Professor of the Arts at John Hopkins University since 2014.

I think [the keynote speech] actually came about through Roland Corporation because I did a short tour in the summer, which Roland was a backer of. I worked very closely with them setting up my new degree course. They provided lots of MIDI equipment and they have allowed my students to access the Roland Cloud, which has a lot of virtual plugin versions of classic synths on it.

Roland have a good relationship with AES and the idea came up that maybe I was somebody that would be appropriate for the keynote there. Whereas a Bob Ludwig or Bob Clearmountain or somebody would be a good, but sort of conventional choice, for a typical AES keynote because they are the gods of the conventional professional audio world. I sort of have a foot in more than one camp because I was also a Silicon Valley entrepreneur and because I’m now in the academic world. I think it made sense for them to invite me to go there and talk.

The title of your address is “The Conscious Soundbite,” which is going to focus on next-generation technologies, in particular, adaptive and nonlinear music, audio for games, VR, AR, Hearables, and other new media platforms. What has made those areas of particular interest for you personally?

I can talk in a limited way about the topic. I can’t give too much away, “A” because I don’t want to spoil it and “B” because I haven’t yet written my speech! But in general, the areas that we’re investigating in my lab at Johns Hopkins are more in the nonlinear domain. There is a recording arts department at Johns Hopkins where students can learn to be engineers and producers and recording artists and so on. My new lab and my new degree course are focused on nonlinear sound to music, such as 3D spatial sound and music that adapts to the users’ choices in things like games and virtual reality.

So that’s the focus of my talk. I’m talking to the status quo that are the established professionals in the pro audio space, who I’m sure are aware of these topics and aware that there’s a lot of activity and some breakthroughs coming down the pike in that area. My goal really is to stimulate some thinking and discussion. Stepping outside of the box in those areas, and maybe spark and get them to debate and discourse among those kinds of companies about what they should be doing to adapt their products to the changes that are taking place in the entertainment and academic world.

Subverting Spatial

I know you’ve been a pioneer of VR sound — the first installation you created for that medium was 1993 with “The Virtual String Quartet” at NYC’s Guggenheim Museum. I’m interested to know what you personally experience when you are exposed to a well-executed VR sound or spatial audio experiences? What’s that like for you? It’s obviously important for you to get other people to be able to experience it too.

I’ve been skeptical at times because I’ve been sort of unconvinced by many spatial audio demos and experiments that I’ve heard in the past. I’ve come to realize that I think part of that is actually my physiology, that we’re not all created the same way. You know our ear canals and the shapes of our ears and our skull really have a big effect on the way that we perceive the world. It’s rather like eyesight, some people are able to adjust and others are beyond adjustment and they need to be looking through a lens in order to see things properly.

I think that it’s very much the same with hearing. If you try and synthesize the way that our brains spatialize audio in the real world, 95% of the time you’re going to actually miss the mark. I only realized this when I had my HRTF done, are you familiar with that? The Head Related Transfer Function is an expression of the way that spatial sound enters our heads, and this differs based on our physiology.

So you can actually measure on an individual the way that they perceive 3D sound, and you can come up with a little algorithm that basically attenuates the way sound will be delivered to them. The way you do that is by taking a measurement: You need to have a booth, then you need to place microphones in the ears of the individual and you need to measure the differences between the mics in terms of volume and delay and tonal variance, and so on, in order to come up with essentially the equivalent of an eye prescription for hearing.

Obviously, as with eye prescription somebody else’s specs are unlikely to work for me. The first time I ever had a custom HRTF done for myself and I then listened to 3D spatial sound, it was like the “aha” moment of the first time you’ve got a prescription spectacle and you put it on, and suddenly you can see the world as it’s supposed to be. One of the issues really with 3D sound is that HRTF is either not used or there are standard library HRTFs which were done 30 years ago. A lot of people think they’re hearing 3D sound, but actually they’re not.

One of the hurdles that we need to cross, for example, is we need to get to a point where people who are looking to experience proper 3D sound will go through some sort of test, which will attenuate or calibrate their system to their custom needs. There’s various companies working on affordable, more intimate ways of doing that, ranging from a booth that you would step into like a photo booth, to using your mobile phone to take a 360 photograph of your head, through to hearables. All of these are ways that hearing can be better customized to the individual and that’s an important step that needs to take place.

So this is part of what you’re trying to build awareness of at AES, coming up?

That’s one of the topics that I’ll cover. And that’s all before we even start to get creative. That’s before we talk about different types of 3D audio algorithms. Many of the companies involved in VR are working in this field, people like Google and Facebook and Sony and Dolby Labs and Bose and Harman, and so on are all working in these areas. There’s a lot of rapid advancement taking place from large companies that have a very vested interest in getting VR over the hump.

Smart Sound Files Wanted

Another concept in your keynote that you’re trying to raise awareness about is music files being “dumb and rigid” and I thought that was a very interesting perspective. What do you mean by that and where are you hoping to steer things from there?

Yeah, well I mean if I ask audio professionals about a sound file they probably instantly think wav or aiff. They might think, “Okay, are we talking 44.1 or 48k? Are we talking 24-bit? Are we talking 96k or even more high resolution? Are we talking audiophile sound?”

Sound files are expected to smarten up under Dolby’s watchful ear.

What we’re talking about is a sound file that is designed to be loaded and played and paused and played back and maybe affected or manipulated in some way. But it’s basically a file that has a static recording of some sort, which we’re going to listen to and maybe fool around with. And if you don’t like that one, load up this one instead and then we’ll mess with that. And if you’re dealing with multitrack in a workstation, here’s the folder of these aiff files or wav files. We’re going to load them all on different tracks and we’re going to play them back in parallel and we’re going to maybe mix between them or maybe use one to affect another.

But the files themselves are “dumb” because other than the sound information, and maybe a couple of meta tags, they don’t share any of their own affordances. They don’t declare or broadcast any of their content. They just expect to be loaded and played. When you’re creating a real time audio environment using multiple threads, this is very distinctly different from musicians playing together in real time because if you put five musicians in a room, they make eye contact, they listen very hard to each other and if they’re good musicians, as they play they’re making minute, microscopic adjustments to the way that they play in response to the way the others are playing.

So with an orchestra, for example, you know that there is no finite tuning for an orchestra because everybody is juggling, tuning on their fingertips in a matter of nanoseconds. If you’re part of a string section you’re listening very carefully to those around you and you become part of the whole. If you’re singing in the choir, you create a resonance that exists in the whole choir and you try and get in tune with that resonance. And when a choir is doing well or if they’re a great choir they’re able to become a sort of collective sandbox, with the waveforms coming out of each of their mouths. But, if those individual component are dumb audio files then you don’t have the ability for them to react to each other in that way.

If the programmer is very skilled and very astute, then they can simulate the kind of interaction and resonance that exist between multiple musicians or singers. And they can do it very well. In the hands of a master programmer, an East West string library can sound incredibly convincing like an orchestra. But, a lot of that takes hours, days, weeks and is very time consuming and requires a highly skilled programmer or composer or arranger all the time. And when they’re done with it, you will render it out as a stereo file and create yet another dumb sound file.

So, one thing that occurs to me as I start moving more and more into an interactive, nonlinear environment where, [is] there is no perfect mix, where the sounds are being triggered by the choices that the user or the audience is making, the same way as there are in real life. You walk into an antique shop and you knock over a coat stand and then you run your hand along the sound chimes and so you’re creating sounds in the real world as you go. That’s essentially what’s happening in these, in games and in virtual reality.

But sometimes it takes dozens of individual dumb soundbites being triggered asynchronously in order to recreate that world. I think that’s basically wrong. For example, in a game if you have a Grand Prix racing game and the user is in control of a Formula One car, then in a PlayStation or Xbox game very often there’ll be 32 or 64 individual streams of sound files all being triggered and mixed independently of each other based on acceleration, braking, cornering, skidding proximity of other cars, the environment you’re driving through, all of these things. But you have to have this big, big library of files that are all being triggered separately.

Now in a real Formula One engine, the sound is being generated by something completely different. It’s being generated by the engine, the exhaust, in other words, what you really need is a single mathematically modeled Grand Prix engine within the game in real time in order to do that accurately, time after time. Then we wouldn’t necessarily have to go through all the rigmarole of fine tuning the 64 separate files.

So my basic point is that, as an industry we tend very much to build these tools and libraries and platforms and plugins and effects expecting that we will open a folder of separate files and will somehow get them to perform together and simulate the real world. But maybe those are the wrong standards to be writing to. Maybe we need to figure out a way that real time sound is not just a collection of separate files with no knowledge of each other, but in fact, real time sound is being created on the fly by a number of parameters that are interacting.

Just remember that I’m not a scientist, I’m a musician. But I’m a musician who has to work within the confines of the technology that science provides me at any point in time. I’m just lodging a complaint. I’m lodging a complaint with science and saying, “Okay, there is lag going on here.” And I walk around AES and I’m still seeing an industry that is dealing with 20^th-Century concepts. And that more and more effort and research and product experimentation and risk need to take place in the nonlinear area or else it’s going to hold it back.

That said, I think it would be a great convention for students or for young people wanting to start out in the music industry to attend. I think there’s some very exciting breakthroughs that we’re right on the edge of, and hopefully some of them will be displayed at AES. A lot of people are talking about the same areas that I’m talking about and think it’s a good time to go and figure out what career path you want to pursue.

Enduring Audio

Part of people’s motivation in making art is not just to be on the cutting edge, but also to create things that endure. Why do you feel that the music of the ‘80’s is proving to have this power, decades later?

That’s a very interesting question, because I’ve always felt that the more state-of-the-art you are, the more there’s a risk that what you do won’t stand the test of time.

“She Blinded Me With Science”, released in October 1982, was a key catalyst in the ’80’s music breakthrough.

I look back at two of my videos, “She Blinded Me With Science” and “Hyperactive.” One was kind of a silent movie style from the 1920s, which was just a timeless story with interesting visuals and fun characters, and so on. And the other, “Hyperactive” at the time was very state-of-the-art. It used video effects that hadn’t already been seen before. I don’t think it holds up as well because I think since then a lot of that stuff has become quite cliche.

Five years ago I made a documentary film myself, The Invisible Lighthouse. I did it on my own with affordable video cameras, and I used a drone and a selfie stick. The footage at the time looked incredibly DIY in a good sort of guerilla way, and people wondered how the heck I had done it. And now, aerial drone shots and selfie stick shots and things have become such a norm that they’re very cliched.

I think there’s definitely a danger if you do stuff that is state-of-the-art using all the latest tools, that you won’t in fact do something that stands the test of time. Amazingly, people seem to feel that the work that I did in the ‘80’s has held up pretty well. I think that’s a product of the fact that I don’t really care about what’s getting played on the radio or in the clubs. If it’s common enough that it has a sound that fits in with what’s going on today, then I’m in a way sort of repelled by it.

I’d rather steer clear of something that sounds “in the moment”. I’d rather do something retro or futuristic or other worldly or Dystopian or something that just doesn’t fit into the here and now of today. And I think probably that’s why my music has stood up.

But there was no formula, when I was working in the early ‘80’s. Everybody was rewriting the rules all the time. It was rebellious because rock and roll music had become very corporate and staid through the stranglehold of American radio, and in my opinion had become very formulaic. So those of us that were going against the grain were forced to be quite experimental and take risks and so on. There was no alternative, there was not an underground alternative and people were very keen. They were very hungry for new sounds, new styles, new hairdos and stage costumes, and so on.

It was an exciting period where anything goes. I think we’re very lucky to be alive during that time and I try not to be judgmental about today’s music, but what I hear from teenagers today is they’re actually jealous of what we were able to enjoy when we were their age.

Here’s my last question. In your art, your lyrics, song titles, album titles, the music itself that you’ve made, you exhibit a certain sense of fun throughout. How does one make room for that mindset, not just in music creation, but everywhere else in life, from audio engineering to entrepreneurship and beyond?

I think that’s a hard question, really. It’s a mistake to take yourself too seriously when you’re an artist or in the service industry or whatever. I like stuff that’s lighthearted and humorous because I think sometimes deeper meaning comes out of humor. People like to be entertained and cheered up and made to laugh, but often there’s something more poignant behind it.

David Weiss

Please note: When you buy products through links on this page, we may earn an affiliate commission.

Related posts: