100k Burst Concurrency Benchmark
Sandbox milestones
Concurrency by phase
Loading latency scatter…
| Phase | P50 | P95 | P99 | Mean | Count |
|---|---|---|---|---|---|
Allocate sandbox.create() | 321ms | 432ms | 566ms | 328ms | 131,320 |
Readiness node -v after allocate | 269ms | 674ms | 733ms | 330ms | 131,320 |
| Segment | Idx range | Count ok | P50 | P95 | P99 | Max | Mean |
|---|---|---|---|---|---|---|---|
| First 25% | 0–32,829 | 32,830 | 342ms | 491ms | 814ms | 1.26s | 360ms |
| Middle 50% | 32,830–99,469 | 65,660 | 329ms | 433ms | 507ms | 679ms | 333ms |
| Last 25% | 99,470–132,859 | 32,830 | 283ms | 381ms | 435ms | 550ms | 288ms |
Methodology
What we test
Each provider gets one attempt to hold 100,000 sandboxes concurrently from a cold start. The one variable that we successfully measured (without confounding variables) is that each provider successfully held 100,000 live sandboxes concurrently during the test. Some may have failures caused by quota/rate limits (not a true failure) or by us sending more than 100k create() requests.
Execution
One thousand coordinator processes run in parallel — each on its own Namespace VM, each responsible for up 140 sandboxes. All 100,000 sandbox.create() calls fire simultaneously; there is no stagger or ramp. Sandboxes are numbered (0-indexed) in submission order so ordering effects can be measured.
Lifecycle
Each sandbox moves through four steps: allocate (create returns), readiness probe (node -v inside it — failure → readiness_failed), alive hold (all 100,000 sandboxes held simultaneously while fleet-wide peak concurrency is recorded), then liveness check (a second node -v — pass → success, fail → partial). The Invitational score is the fleet-wide peak concurrent count at the hold.
Measurement
Timing is recorded by the coordinator at the moment each API call returns or errors — it does not include client-side retry logic.
Definitions
Benchmark phases
- Allocate
- — time from
sandbox.create()until the API confirms the sandbox exists. - Readiness
- — time from sandbox creation until
node -vexecutes and returns successfully inside it. - Alive
- — in concurrency charts, the window a sandbox is held open after its initial readiness probe until teardown. The second liveness check runs during this window — a sandbox that fails it appears alive in the chart but is counted as
partial. - Worker point
- — in the latency scatter chart, each point represents the median latency of a group of sandboxes from one worker, ordered by submission index. The group size scales so roughly 1,000 points appear regardless of run size.
VM metrics
Each VM managed up to 140 sandboxes. One thousand ran in parallel to produce the full 100,000+ sandbox burst. System metrics shown here reflect a single coordinator VM's resource usage — all 1,000 coordinators ran on identical hardware.
- RSS
- — resident set size: total physical RAM the process occupied, including code, stack, and all heap pages loaded into memory.
- Heap used / Heap total
- — V8-managed memory for JavaScript objects. Used is live data; total includes GC headroom reserved but not yet filled.
- Event loop lag
- — delay between when a callback was scheduled and when it actually ran. High p99 lag means the coordinator was CPU-saturated and some requests stalled in the queue.
- Open FDs
- — open file descriptors: each outbound HTTP connection to the provider API consumes one. Peak FDs reflect peak request concurrency from the coordinator's perspective.
- TCP inuse
- — TCP connections in an active state (ESTABLISHED, SYN_SENT, etc.) at any moment, read from
/proc/net/sockstat. - TCP TIME_WAIT
- — connections the OS is holding after close to handle delayed packets. High counts indicate rapid connection churn — the coordinator was cycling through many short-lived connections.
- Load avg
- — Unix load average: the number of processes runnable or waiting for CPU/IO, averaged over 1 and 5 minutes. On a single-vCPU coordinator VM, values above 1 indicate CPU saturation.