Marker Tracking in Babylon.js

Bringing Native Marker Tracking to the Web with WebAssembly

9 min readFeb 21, 2019

Marker Tracking in Babylon.js with OpenCV

We’re experimenting with a way to easily add marker tracking to Babylon.js web apps, and even to Playgrounds!

Meet our newest experiment: marker tracking in Babylon.js! Encapsulating the power of OpenCV’s awesome ArUco module, we’ve created a simple and easy-to-add utility for tracking AR markers in any Babylon.js Web app.

We’ve got a basic Playground showing off this experiment in action; check it out here. But to try out marker tracking, you’ll need a marker (and a webcam), so…
We’ve made some markers available as Web pages here. Try loading one up on your phone and the Playground on your computer (or vice versa) and watch one screen track the marker on the other.
Have a question? Want to discuss things we can do with this? Think we should create an official Babylon.js extension? Leave a comment below, or join the conversation on the Babylon forum!

The Babylon logo… It’s out in the world! IT’S ESCAPED THE INTERNET!!!

Webpiling: Bringing Native Libraries to Web

It is surprisingly easy to take existing native (C and C++) libraries and make them available for use in Web apps. The open-source community has created an amazing collection of tools to enable this. In the remainder of this article, we’ll talk through the process.

If I do say so myself, there are a lot of things to like about our marker tracking experiment for Babylon, but there are two attributes that stand out most prominently in my mind.

You don’t need computer vision expertise to use it.
You don’t need computer vision expertise to make it.

I really want to emphasize this, so I’ll say it again: you don’t have to be an expert. This, in my opinion, is one of the most important parts of Web development culture, and it’s a big part of the reason Web technology is so wonderfully diverse and varied. Historically, Web development has often emphasized speed of development and low barrier-of-entry; and in the course of creating marker tracking for Babylon, I’ve worked out a process for bringing native libraries to the party, which for convenience I’m calling webpiling.

webpile (noun): a portmanteau of “Web” (of World-Wide fame) and “compile” referring to the act of recompiling and encapsulating existing native (C and C++) utilities into a form that is accessible and easy to incorporate into browser-based Web apps.

Webpiling is not a new idea — people have been working toward this capability for many years — but only recently has the technology reached a point where we can actually start, on an individual basis, to bring native library capabilities to Web platforms. This is the time, folks! We’ve got the tools; all we need is you. Webpiling is an awesome and incredibly rewarding endeavor, and in case you aren’t already convinced, here are three more reasons to learn about webpiling.

It’s easy. How easy? This easy. The repository at that link contains a bash script — a single script — which performs all the steps required to build, from OpenCV source, our webpiled marker tracking for Babylon project. We’ll discuss that script in more detail shortly as it functions as an overview of the webpiling process itself. For now, though, just think about that one short script, comments and whitespace and all, and realize that webpiling really is that easy.
It’s powerful. Would you like to make an app with computer vision? OpenCV has all kinds of amazing features. Need to extract text from a webcam photo? Tesseract can help you do that. Or perhaps your app requires advanced physics simulation? Or audio synthesis? Compute-intensive tasks like these are often solved first and best in native libraries; and to date, such capabilities have often been considered beyond what can be efficiently executed in a browser. But times are changing. Browser technologies are getting faster. And with tools and techniques for webpiling, there’s a whole category of natively-implemented technologies just waiting to be put on the Web.
It’s sustainable. This is not the first time someone has brought the power of a native library to the Web. This isn’t even the first time somebody’s brought OpenCV’s ArUco module to JavaScript. However, many of the existing efforts have focused on manual transpiling — looking at the native source code and reimplementing it, by hand, for a new platform. Manual transpiling is sometimes the way to go; but it’s work-intensive, time-intensive, and presents a lot of challenges surrounding maintenance: new features and bug fixes made to the original project won’t be added to a manually transpiled utility until somebody goes back through and does so by hand. But webpiling doesn’t have this problem because, to webpile, we build directly from the original source — no manual transpilation required. Thus, if a new feature or bug fix gets added to the original project, updating the webpiled version is as easy as rerunning a build script.

I firmly believe that the best way to promote innovation is to give powerful tools, and the ability to use them, to as wide a variety of creators as possible. Webpiling — encapsulating powerful native libraries for easy and accessible use on the Web — is an incredibly exciting opportunity to bring new capabilities to a vast number of developers. I hope you find this as exciting as I do. If so, let’s do it! Go forth and webpile something!

The Technical Part: Webpiling All the Things

The repository I linked to above contains a bash script called webpile_all_the_things.sh. (Don’t judge, naming files is hard.) This script exists for only one purpose: to illustrate what I mean when I say that webpiling is easy.

What’s happening here is not as complicated as it looks; but it sure looks cool!

Webpiling, on a technical level, is the concatenation of a breathtaking number of open-source tools. When I say that it’s easy, I mean that there’s very little we have to do in order to webpile something. There is, however, quite a lot to understand. The rest of this post, however, will give a high-level overview of the relevant tools, techniques, and targets used in that script— in short, what it actually means to webpile all the things.

The Platform: WebAssembly

The technology that most directly underpins webpiling is WebAssembly. The details are best explained on the official site, but for our purposes you can think of WebAssembly as being a kind of platform, like x86 or ARM, in that you can compile native libraries to be executed from this format. The output of this compilation process is a .wasm file, which can then be loaded and called from JavaScript in order to run natively-implemented functions in a browser. WebAssembly has a number of benefits, but the two that are most important for webpiling are (1) it’s a logical output format for native library implementations and (2) it runs very, very fast.

The Compiler: Emscripten

The technical goal of webpiling, then, is to start with the source code of a native library and end with a .wasm binary that makes our desired features easy to use on the Web. Our fundamental tool to achieve this goal is Emscripten, a collection of open-source utilities for compiling native code (C and C++) to asm.js and WebAssembly. Emscripten is a phenomenal utility, and its high quality is a huge part of why webpiling is feasible and easy.

Emscripten first appears in the script on lines 25 through 28. These lines navigate into the Emscripten submodule and execute a few commands to ensure that the tools are properly set up and up-to-date. That done, our compiler is ready, and we have everything we need to move on to…

The Build System: CMake and Make

OpenCV has an extremely robust build system out-of-the-box, so really all we have to do is invoke it. This system is based on CMake, which is a well-established and very powerful build script generation tool: a build system for making build systems, in a way. We will use CMake to configure an OpenCV build, which will generate a Makefile that will do the actual compilation.

Of course, none of this build system was designed with webpiling in mind. Fortunately, Emscripten has tools designed to make it easy to integrate with CMake-based build systems. For our configuration command, Emscripten provides a CMake toolchain file that modifies CMake’s behavior to use Emscripten instead of the default native compilers. Then, for our compilation command, Emscripten provides a wrapper that similarly modifies the behavior of make: emmake.

Configuration and compilation is done on only two lines of the script: lines 42 and 43, respectively. The configuration line is by far the most complicated command in our webpilation process because it digs deeply into the build options of OpenCV. A meticulous breakdown of this command is beyond the scope of this document, but it’s exactly the sort of thing I’ll be happy to discuss in the comments or on the forum. Compilation, by comparison, is extremely simple. emmake wraps make and tells it how to use Emscripten, and opencv_aruco tells make which subset of OpenCV’s capabilities we actually want to build.

The Encapsulation: Bridging the Gap Between Native and Web

This last step is simple, but it’s the most important part of the process because it determines exactly how our newly-compiled native capabilities will be accessible from the Web. In my case, I wanted my webpiled utility to be as simple and accessible as possible, so I wrote a small native wrapper (based on an excellent video tutorial I followed) to hide the OpenCV API and expose only a simple, easy-to-use surface to the calling JavaScript. The details of why and how I encapsulated in this way are, sadly, also beyond the scope of this document; but like the CMake command, they’d make for a great comments or forum discussion. 😃

So, after the OpenCV static libraries have been built, the rest of the script is simply maneuvering files around so that the encapsulating code can build the way it expects to. The script then calls another script to build the final output, so the build command for the encapsulating code is actually in build.sh. Similarly, webpile_all_the_things.sh then calls the encapsulating code repository’s run.sh script, which hosts your newly-webpiled .wasm file in an Emscripten debug server so that you can access it in a browser at http://localhost:8080.

And that’s it! Those are all the steps. The webpile_all_the_things.sh script itself contains a more incremental commentary on what it’s doing; but apart from what I described above, most of what it’s doing is just file logistics. Webpiling really can be that simple: there’s a lot you can know, but there isn’t that much that you have to do.

Conclusion

It may not surprise you to hear that this article, as originally written, was much longer. Entire sections were dedicated to the details of the OpenCV CMake command, the importance of using static libraries, the significance of Emscripten’s LLVM underpinnings, and the motivation for the encapsulating code. I get very excited about technology like this, and I wanted to share with you every single detail I thought might be interesting.

But the article quickly became so long that I worried I had buried my main message, which I will say again because I can’t say it often enough: you don’t have to be an expert to do any of this. The most important thing I wanted to communicate in this article is that webpiling is achievable; and if you’re interested — if there’s some native library you want to see on the Web — I want you to feel empowered to go make that happen.

But I still think the details are worth knowing — in fact, I’m eager (perhaps overly so 😉) to discuss them! There are a lot of specifics that aren’t addressed in this article, and that’s why I want to invite you, once again, to leave a comment or ping me in the forums if there’s anything about webpiling you want to discuss. And keep in mind, too, that I’m not an expert either: I’d never done anything like this before this project. WebAssembly is a new technology, webpiling is a new frontier, and we’re all learning. So join the discussion and come learn with us!

Justin Murray — Babylon.js Team

Twitter

The latest Tweets from Justin Murray (@syntheticmagus). Microsoft employee working on Babylon.js. Redmond, WA

twitter.com