Performance improvement is a never-ending project
This is an exercice we are trying to do as often as possible (Raanan is even working on an automatic tool to make sure PRs are not regressing performance).
During the last occurrence of that exercice we figured out several improvements that I wanted to mention here.
The main one was mostly due to an incorrect assumption I did when I started Babylon.js (eons ago). I believed that sorting meshes would be less optimal than making sure we can reduce context changes (by sorting meshes by material). I was wrong and my mistake was to simply not challenge my assumption and do a simple test. It is never too late to improve ;)
So, we did some tests and we figured out that the cost was really minimal and on large scene it was a really impressive win.
Let’s take this playground for instance: Babylon.js Playground (babylonjs.com)
It displays 1000 spheres with 50 different materials.
With the material sorting, we can render each frame in approximately 6.8ms on my computer.
Without it (so as it was before our fix), I’m running the same frame in 8.9ms. We can then render complex scenes (with several meshes and materials, which is pretty common actually) 30% faster. This is because the engine needs to constantly update WebGL state instead of reusing what is already setup
As always, you can turn off the material sorting by simply asking the scene to use a default sorting mechanism:
Unleashing the power of parallel shader compilation
This one was initiated by a user complaining that loading a scene in 5.0 was slower than in 4.2 (which should never happen :))
The problem was that when waiting for the scene to be ready we are going through the list of meshes and we are checking if their material (and thus their shader) is ready. Instead of going through the entire list, we stopped compiling shaders when we found the first material that was not ready. The problem with that was that we did not trigger the compilation of the other materials until the next loop.
Because Babylon.js supports parallel shader compilation we have to make sure that we ask all shaders to compile as soon as possible to leverage the parallel aspect of the compilation.
Avoiding V8 deoptimizations
One thing that was painful to find was the fact that our main object (the Mesh class) seemed slow. But not in a consistent way. More something that we saw when having scene with a lot of meshes. These scenes were simply slower on all aspects of the engine (compared to 4.2).
The fix was actually quite easy, but it took us several hours to figure that one out.
To do so we had to run Chrome (or Edge) in a particular mode where the browser can expose additional metadata about our classes:
“C:\Program Files\Google\Chrome\Application\chrome.exe” — user-data-dir=c:/temp/chromelog — js-flags=” — allow-natives-syntax”
When run in that mode, you are allowed to call additional functions on your objects.
We wrote a special playground where we called these functions to see which objects were deoptimized
And a few more
If you are curious, you can see the list of tiny other improvements we did on the main repo:
- Skipping access to properties in favor of backing field
- Making sure we are not flagging material as dirty when not required
- Caching regular expressions
David ‘deltakosh’ Catuhe