WASM Isn't Always Faster

WASM Isn't Always Faster

I've been fascinated by WebAssembly for years. The promise of running native-compiled code in the browser felt like the return of something powerful — the same architectural idea that drove Java applets and Swing, but without the cruft. Load a compiled binary, execute at near-native speed, bypass the interpreter entirely. The appeal is obvious to anyone who's built something computationally expensive on the web.

But learning Rust just to test a hypothesis didn't make sense. Until this year, when I finally reached the point of disciplinary maturity with AI-assisted development that the barrier evaporated. Using Kiro, I built an entire benchmark suite — seven distinct physics simulations in both Rust/WASM and pure JavaScript, with a full three-way comparison against native execution — in a single day. Rust, wasm-pack, wasm-bindgen, arena-allocated quadtrees, Lattice Boltzmann fluid dynamics. None of it in my existing skillset. All of it running in production by Saturday night.

The results surprised me. Not because WASM is slow — it isn't. But because the situations where it actually matters are narrower than the hype suggests.

The Experiment

Seven computationally intensive simulations, each implemented identically in Rust (compiled to WASM) and pure JavaScript, running side-by-side in Chrome:

  1. Particle Collision — O(n²) brute-force elastic collisions
  2. N-Body Orbital — pairwise gravitational dynamics with Velocity Verlet integration
  3. SPH Fluid — Smoothed Particle Hydrodynamics with kernel evaluations
  4. Lattice Boltzmann — D2Q9 mesoscopic fluid simulation
  5. Eulerian Grid — Stable Fluids Navier-Stokes solver
  6. Graph Layout (Naive) — O(n²) force-directed all-pairs repulsion
  7. Graph Layout (Barnes-Hut) — O(n log n) quadtree-accelerated layout

Each benchmark: 50 warmup iterations for JIT stabilization, 200 measured iterations, repeated 3 times. Native Rust via Criterion.rs as the performance ceiling. Identical algorithms, matching constants, same PRNG seeds.

Three Tiers, Not One Story

The headline number — "WASM is 1.5–2× faster than JavaScript" — is technically correct and practically misleading. The reality breaks into three distinct tiers:

Tier 1: Strong WASM advantage (1.9–2.0×) SPH fluid and Lattice Boltzmann. Dense floating-point math with conditional branching in inner loops. The SPH kernel evaluates if dist < smoothing_radius billions of times with unpredictable outcomes. V8's branch predictor can't speculate effectively; WASM's ahead-of-time compilation avoids the deoptimization penalty entirely.

Tier 2: Moderate WASM advantage (1.5–1.7×) Particle collision and Barnes-Hut graph layout. Pairwise computation with conditional logic, but less floating-point density per branch. WASM wins, but not dramatically.

Tier 3: No meaningful difference (0.97–1.08×) N-Body orbital, Eulerian Grid solver, and naive graph layout. Simple arithmetic loops with predictable memory access. V8's JIT compiles these to code equivalent to what a C compiler would produce. WASM is sometimes slower — the naive graph layout shows JavaScript marginally outperforming WASM at 0.97×.

The same O(n²) complexity appears in all three tiers. Algorithmic complexity doesn't predict WASM advantage. Instruction-level behavior does.

What Actually Determines the Winner

Three factors predict where WASM earns its overhead:

Conditional branching density. When inner loops contain unpredictable branches — distance checks that go either way with near-equal probability — V8's speculative optimization becomes a liability. Every wrong guess triggers a deoptimization. WASM, compiled ahead-of-time, pays no speculation tax.

Floating-point operation density. SPH kernel evaluations chain sqrtpow, division, and multiplication in tight succession. WASM executes these as direct f64 instructions. V8 maintains runtime type guards even after JIT compilation. For simple arithmetic, the guards are negligible. For dense numeric pipelines, they compound.

Memory access predictability. The Eulerian Grid solver walks memory sequentially — stride-1 array access, perfectly prefetchable. V8 optimizes this to native-equivalent performance. But Lattice Boltzmann streams in nine directions with boundary condition checks, defeating simple prefetching. WASM's predictable memory model handles this better.

The pattern: WASM's advantage comes from predictable performance in the face of computational unpredictability. When the computation itself is predictable, V8 matches native. When it isn't, V8's speculative machinery becomes overhead rather than acceleration.

The Native Ceiling

WASM carries 10–22% overhead versus native Rust, depending on the workload. That overhead comes from sandboxed memory (bounds checking via guard pages), missing CPU-specific instructions (no fused multiply-add, limited SIMD), ABI boundaries between JS and WASM, and reduced whole-program optimization.

For SPH specifically, native is 1.66× faster than WASM — suggesting that kernel-heavy floating-point code benefits from auto-vectorization and instruction scheduling that the WASM backend can't replicate. The "near-native" promise holds for most workloads but breaks down for the most numerically intensive ones.

The AI-Accelerated Experiment

Here's the part that matters beyond the benchmark numbers. This project would have been impossible for me six months ago. Not because the concepts are beyond me — I understand physics simulation and systems architecture — but because the implementation language was a barrier. Rust's ownership model, lifetime annotations, wasm-bindgen macros, arena-allocated data structures. Learning that toolkit from scratch for a proof of concept was never worth the investment.

With Kiro, the dynamic inverted. I described what I wanted — "implement SPH fluid dynamics with poly6 kernel for density and spiky gradient for pressure forces" — and the agent produced correct, idiomatic Rust. I could focus on the architecture and the experimental design while the agent handled the language-specific implementation. The entire project, including 77 passing tests, went from concept to commit in one day.

This is the actual unlock of AI-assisted development. Not generating boilerplate faster. Not autocompleting function signatures. Removing the language barrier between understanding a problem and implementing a solution. I could have done this in JavaScript alone — I know JavaScript. But the whole point was to test whether WASM, compiled from a systems language, would outperform it. That test required Rust. And Rust, via an agent, became accessible in an afternoon.

When to Choose WASM

The data makes the decision framework clear:

Use WASM when your hot loop contains dense floating-point math with conditional branching — physics engines with collision detection, fluid simulations, signal processing with adaptive algorithms, image processing with irregular kernel access. Or when consistent frame timing matters: WASM avoids the JIT deoptimization pauses that cause frame drops in JavaScript.

Stay with JavaScript when the algorithm is simple iterative computation with predictable memory access — matrix operations, simple solvers, straightforward N-body. The 0–5% performance gap doesn't justify the toolchain complexity.

The overlooked factor: WASM's advantage is constant across problem sizes. The SPH speedup is 2.0× whether you have 100 particles or 2,000. This means the decision is about workload characteristics, not scale. You don't need "big data" to benefit from WASM. You need unpredictable computation.

The Takeaway

WebAssembly delivers on its promise, but the promise is more specific than the marketing suggests. It's not "everything runs faster in WASM." It's "unpredictable computation runs consistently fast in WASM, because ahead-of-time compilation doesn't gamble on speculation."

V8 is extraordinary engineering. For predictable workloads, it matches native performance through speculative optimization. But speculation is a bet, and bets have a cost when they're wrong. WASM doesn't speculate. It compiles once and executes deterministically. That determinism is the actual advantage, not raw instruction throughput.

The technology selection should be driven by workload characteristics, not by assumption. And the ability to test that selection — to build a comprehensive benchmark in a language you don't know, in a day — is what changes when AI tooling reaches maturity.