The Babylon technology family is growing!
Earlier this year, we started experimenting with ways to make open-source computer vision (CV) and augmented reality (AR) technologies easy to add to Babylon.js Web apps. Excited by the results and inspired by the enthusiasm of the community, we decided to give those experiments a permanent home.
Babylon AR is our new initiative to make computer vision on the Web easier and more accessible than ever. While the project is still in its very early stages, our hope is to blaze a trail for advanced image processing algorithms to be made available in Web apps — easily, openly, and without requiring domain knowledge.
A Word on Domain Knowledge
Domain knowledge: expertise (knowledge) that only applies to a very limited field of study or application (domain).
In short, domain knowledge is what distinguishes a generalist from a specialist. Applications that require domain knowledge can typically only be approached by teams that happen to have access to such a specialist.
Pretty much every kind of expertise begins as domain knowledge. 3D rendering, for example, was once a highly specialized field — and in some ways it still is — but tools like Babylon.js are constantly lowering the barrier of entry, empowering more and more developers to leverage capabilities that used to be available only to a specialized few. Similarly, AR (and, more broadly, CV) is notorious for requiring domain knowledge in order to do anything, so much so that even teams dedicated to AR applications will usually have a particular member who is “the CV dev.”
That’s what we want to change. There’s nothing wrong with having specialists, but requiring specialists drastically restricts the number and variety of teams who can create with a given technology. Babylon is about taking advanced, accelerated, specialized Web capabilities and opening them up to everyone, and Babylon AR is another step on that journey.
The Current State
As mentioned above, Babylon AR is still in its very earliest stages; alpha might be a good description, though we haven’t yet adopted a versioning system. The plan is to add self-contained, modular features to enable new capabilities. Right now, though, there’s only one (work-in-progress) flagship capability: the ArUcoMetaMarkerObjectTracker. However, that’s too long a name to keep typing, and AUMMOT is almost as bad; so as we talk about it in the following sections, I’m just going to call it Fred.
Fred is an object tracker, which means his job is to analyze a camera feed to identify known objects in the world and describe, in 3D, how those objects are positioned and oriented relative to the camera. Fred in particular is built on OpenCV’s ArUco Marker Board technology; the objects he’s able to track are specific configurations of QR-code-like markers. This technology has been available in OpenCV for years, so Fred’s not trying to introduce anything new; he just takes an existing technology, brings it to the Web, and makes it easier to use than ever before. (The meta-marker, or “board,” associated with that demo can be found here.)
Lines 7, 9, 11, and 12. That’s it. Four lines are all it takes to add Fred-based object tracking to your Babylon.js Web app. Please bear in mind that this feature is still in early alpha, and robustness across devices is one area where known work remains. However, the remaining work shouldn’t change the API substantially, and once the implementation is more robust, it should still only require four lines of code to bring Fred everywhere.
The Four Faces of Fred: a Tour of Babylon AR
As Babylon AR’s current flagship feature, Fred provides an effective case study showing how the repository is laid out and fits together. There are four places in the repo that are relevant to building and testing Fred; thus, for primarily alliterative reasons, I’ll refer to these locations as faces. (Note: the information in this section will mainly be of interest to those who are considering developing in the Babylon AR repository.)
As mentioned before, Fred’s core technology is an OpenCV marker tracker written in C++, so the first of the four faces can be found in Babylon AR’s src/cpp/Fred folder. In layout and content, this folder is closely based on the examples from the “webpiling” blog post from earlier this year, so I won’t dwell at length on implementation details here. It’s sufficient to say that the implementation in this folder encapsulates the tracking logic, powered by OpenCV technologies, that lies at the heart of Fred’s capabilities. When built using the command
The second face of Fred can be found is src/ts/Fred. This contains the implementation for Fred’s TypeScript “surface,” which manages the relationship between the consuming code and the underlying WASM. This is the code that’s responsible for making Fred easy to use. Note that there is no direct relationship between the TypeScript portion and the C++ portion; in fact, no implementation folder in src contains dependencies on any outside folder except for its local shared folder (if one exists). This pattern, reinforced by the structure of the repository, is designed to help avoid circular dependencies and keep the Babylon AR repo clean, clear, and easy to work in.
Like the C++ portion, the TypeScript component of Fred can be built using a dedicated gulp command; however, it’s usually more convenient to just build it as a part of the complete Babylon AR offering.
When you’re ready to deploy, you can use the command
to move deployment files to the docs folder — the “production” face of Fred. When pushed to the main repo, the docs folder is actually the root for a GitHub Pages site for which the CDN link, https://ar.babylonjs.com, is an alias. Once deployed, you can incorporate Fred (or other forthcoming features of Babylon AR) into any Web app — and even Playgrounds — as easily as shown in the demo linked to above.
This is only the beginning. Our immediate priority with Babylon AR is to drive up the quality of the ArUcoMetaMarkerObjectTracker (which by now you probably know better as Fred). Afterward, however… Did you know OpenCV contains an implementation of the GrabCut algorithm? Did you know it has two inpainting algorithms? A seamless clone? A face tracker?
And that’s just from OpenCV, one of the major open-source libraries. There are OCR libraries, photogrammetry libraries, photo- and video-processing libraries, and even SLAMs. And though Babylon AR will always remain focused on computer vision, there are opportunities in other fields as well. Only a few weeks ago, my friend and colleague Cedric encapsulated the Recast navigation library into a nav-mesh capability for Babylon.js. The open-source world is full of powerful algorithms hidden behind domain knowledge, just waiting to be opened up to a new audience of developers. There are libraries for audio, physics, computational geometry, text-to-speech, procedural generation…
Think of the possibilities!
Justin Murray — Babylon.js Team