Audio in Babylon Native — Charting the way

Babylon.js
5 min readMay 13, 2021

--

Bringing the audio to everywhere.

In all the various permutations of multimedia experiences it goes without saying that sound is the secret to immersion. Sound is essential for setting the expectations of your audience. Whether its via passively establishing the tone or happenings in your experience with background music and sound effects, or actively providing verbal and non-verbal feedback, sound is a core part of the communication of ideas and concepts.

Especially in the horror genre! Audio is used to great effect in aiding the activation of the fear response in the audience throughout many works.

Where does that leave the humble Babylon Native team?

In a bit of a pickle initially. The Babylon.js engine has the convenience of modern web browsers to provide plenty of creature comforts; the WebAudio API is well implemented across all the browsers that Babylon.js supports. In addition, the Babylon.js sound components (at time of writing) have a hard dependency on WebAudio classes, concepts, and capabilities. Some WebAudio classes are even exposed to developers using the Babylon.js API! That said, the current capabilities of WebAudio seem to be more than enough to power rich interactive experiences; existing implementations support spatial audio panning, and even implement head-related transfer functions. This feature laundry-list contains the essentials for providing immersive audio in an XR experience:

Special mention of Raanan Weber’s XR rhythm playground! A blueprint of what can be done with Babylon.js, and what should be doable with Babylon Native. playground.babylonjs.com/#MNC4HP

In addition, support for media streaming, and the inclusion of various audio filters and analysis tools also allow for more utility-oriented experiences. One could imagine a Babylon.js-based Digital Audio Workstation utility similar to GarageBand or FL Studio. More realistically, one can easily envision creating a media playing application that users of our engine may want to make into a native experience as well:

Another amazing project that is a guiding beacon as to the experiences and functionality we want to eventually support in Babylon Native. Special thanks to Patrick Ryan! playground.babylonjs.com/#MNC4HP

So how should we go about doing this?

There are a number of different audio solutions we can leverage and choosing the right one requires making sure we can quantify what we value in a solution. For starters, Babylon.js capabilities have always correlated to the standards and sub-components that it has relied on, and audio seems to be no different in this regard. A guiding principle of Babylon Native is to provide the functionality of Babylon.js in a native context, and so it should follow that goal of such a solution should be to provide this via providing the functionality of the WebAudio API in a native context.

Admit it, this scene is noticeably less awesome when it lacks the audio. Source: https://uproxx.com/wp-content/uploads/2021/01/godzilla-laser.gif

Again, how should we go about doing this? A second goal is to minimize maintenance costs for such a solution. In a perfect world, this component should be integrated and not bring an endless swarm of bugs with it. This isn’t necessarily impossible, but some consideration should be given to any existing solutions we are looking to integrate, and we should explore a wide array of possible existing solutions before building our own WebAudio implementation. Leveraging some existing codebase and contributing to its even greater success if we outgrow its capability is well inline with our pattern of behavior regarding solutions we’d rather not implement from scratch, such as our platform agnostic rendering pipeline, and XR capabilities. Why not continue to build atop the shoulders of giants, right?

At a high level, we can evaluate the capabilities of WebAudio as its component parts: An audio graph that controls dataflow and the ordering of effect chains, A library of filters and effects that can be applied to an audio stream, and most importantly, a platform-abstraction layer that handles using the appropriate platform APIs to play these audio streams on hardware. Not necessarily a trivial set of capabilities…

A silver lining of this description is that all responsibilities don’t necessarily need to be implemented by a single component or solution. A downside is that any intra-component interaction should have some abstracting intermediary that we would need to define, in order to minimize inter-component dependencies. In plain English: wherever we choose to use a component, we need to make it easy to remove or replace without affecting the rest of the Babylon Native audio code. Combined with our previous rule of making sure our codebase as maintainable as possible, it follows for me to conclude that we should prioritize solutions with the most capability out of box.

Ok, so we have some rules as to the capability we require, we have some guiding principles as to how we should categorize solution capability, and we have some rules as to how we should prioritize and value the solutions we choose to integrate. What now?

What now?

I’ve taken some time to evaluate the capabilities of some open-source projects that may fit our needs as part of our sound feature tracking.

The evaluation metrics used when choosing a solution, some investigation of source was needed to make sure that we have a decent grasp of what a project proposes it does…

Some projects, while capable of being an all-in-one solution, seem to be very difficult to maintain. For example, we considered reusing build artifacts from Chromium and Firefox’s WebAudio implementations, however they were found to be tightly coupled with their graphics rendering components; we would be introducing a dependency much larger than it needed to be. This is not great for storage-limited platforms such as mobile devices and headsets. We also considered forking the repos as well to refactor the portions we wanted removed, however this would violate one of our goals of this project, as we would now take on the cost of keeping our Chromium/Firefox fork current with the upstream. Not a cost that a team of our size wants to flirt with…

Continuing down the list of possibilities, at the time of writing, we are now evaluating another open-source WebAudio implementation: LabSound. This seems like another promising avenue of our dream of a WebAudio submodule that we can leverage in our source. However, the next steps are functional evaluation, we need to make sure that this solution works well, is easy to integrate into our project, and can be projected in our JavaScript engine as WebAudio classes without much more than a simple WebAudio interface, and supporting Napi-js glue code.

With these project principles clearly defined, it has gotten easier to look at a field of solutions and definitively make a decision as to where to invest time exploring. Following the creed of the late, great Nipsey Hussle: “The Marathon Continues…”

-Nick Barlow

Drigax (@whoisdrigax) / Twitter

--

--

Babylon.js

Babylon.js: Powerful, Beautiful, Simple, Open — Web-Based 3D At Its Best. https://www.babylonjs.com/