In all the various permutations of multimedia experiences it goes without saying that sound is the secret to immersion. Sound is essential for setting the expectations of your audience. Whether its via passively establishing the tone or happenings in your experience with background music and sound effects, or actively providing verbal and non-verbal feedback, sound is a core part of the communication of ideas and concepts.
Where does that leave the humble Babylon Native team?
In a bit of a pickle initially. The Babylon.js engine has the convenience of modern web browsers to provide plenty of creature comforts; the WebAudio API is well implemented across all the browsers that Babylon.js supports. In addition, the Babylon.js sound components (at time of writing) have a hard dependency on WebAudio classes, concepts, and capabilities. Some WebAudio classes are even exposed to developers using the Babylon.js API! That said, the current capabilities of WebAudio seem to be more than enough to power rich interactive experiences; existing implementations support spatial audio panning, and even implement head-related transfer functions. This feature laundry-list contains the essentials for providing immersive audio in an XR experience:
In addition, support for media streaming, and the inclusion of various audio filters and analysis tools also allow for more utility-oriented experiences. One could imagine a Babylon.js-based Digital Audio Workstation utility similar to GarageBand or FL Studio. More realistically, one can easily envision creating a media playing application that users of our engine may want to make into a native experience as well:
So how should we go about doing this?
There are a number of different audio solutions we can leverage and choosing the right one requires making sure we can quantify what we value in a solution. For starters, Babylon.js capabilities have always correlated to the standards and sub-components that it has relied on, and audio seems to be no different in this regard. A guiding principle of Babylon Native is to provide the functionality of Babylon.js in a native context, and so it should follow that goal of such a solution should be to provide this via providing the functionality of the WebAudio API in a native context.
Again, how should we go about doing this? A second goal is to minimize maintenance costs for such a solution. In a perfect world, this component should be integrated and not bring an endless swarm of bugs with it. This isn’t necessarily impossible, but some consideration should be given to any existing solutions we are looking to integrate, and we should explore a wide array of possible existing solutions before building our own WebAudio implementation. Leveraging some existing codebase and contributing to its even greater success if we outgrow its capability is well inline with our pattern of behavior regarding solutions we’d rather not implement from scratch, such as our platform agnostic rendering pipeline, and XR capabilities. Why not continue to build atop the shoulders of giants, right?
At a high level, we can evaluate the capabilities of WebAudio as its component parts: An audio graph that controls dataflow and the ordering of effect chains, A library of filters and effects that can be applied to an audio stream, and most importantly, a platform-abstraction layer that handles using the appropriate platform APIs to play these audio streams on hardware. Not necessarily a trivial set of capabilities…
A silver lining of this description is that all responsibilities don’t necessarily need to be implemented by a single component or solution. A downside is that any intra-component interaction should have some abstracting intermediary that we would need to define, in order to minimize inter-component dependencies. In plain English: wherever we choose to use a component, we need to make it easy to remove or replace without affecting the rest of the Babylon Native audio code. Combined with our previous rule of making sure our codebase as maintainable as possible, it follows for me to conclude that we should prioritize solutions with the most capability out of box.
Ok, so we have some rules as to the capability we require, we have some guiding principles as to how we should categorize solution capability, and we have some rules as to how we should prioritize and value the solutions we choose to integrate. What now?
Some projects, while capable of being an all-in-one solution, seem to be very difficult to maintain. For example, we considered reusing build artifacts from Chromium and Firefox’s WebAudio implementations, however they were found to be tightly coupled with their graphics rendering components; we would be introducing a dependency much larger than it needed to be. This is not great for storage-limited platforms such as mobile devices and headsets. We also considered forking the repos as well to refactor the portions we wanted removed, however this would violate one of our goals of this project, as we would now take on the cost of keeping our Chromium/Firefox fork current with the upstream. Not a cost that a team of our size wants to flirt with…
With these project principles clearly defined, it has gotten easier to look at a field of solutions and definitively make a decision as to where to invest time exploring. Following the creed of the late, great Nipsey Hussle: “The Marathon Continues…”