Skip to main content
ComputeSDKComputeSDK
northflank logo

100k Burst Concurrency Benchmark

Ran 2026-06-18 at 11:07 UTCComplete
Powered byNamespace·1 vCPU / 2GB RAM sharded workers
Burst achieved
100%
peak / 100,000
Peak concurrent
100,000+
verified hold
Result
Reached
target reached
Failed
0
target validation
Target reached
24s
to 100,000 live
Sandbox Spec
0.5 CPU1024 MB
P50 Allocate
321ms
sandbox.create()
P99 Allocate
566ms
tail allocation
P50 Readiness
269ms
first command
P99 Readiness
733ms
tail readiness

Sandbox milestones

Concurrency by phase

Loading latency scatter…

Allocate = sandbox.create() · Readiness = node -v after allocate
PhaseP50P95P99MeanCount
Allocate
sandbox.create()
321ms432ms566ms328ms131,320
Readiness
node -v after allocate
269ms674ms733ms330ms131,320
Latency by submission index quartile — reveals whether the provider favours earlier-queued requests.
SegmentIdx rangeCount okP50P95P99MaxMean
First 25%0–32,82932,830342ms491ms814ms1.26s360ms
Middle 50%32,830–99,46965,660329ms433ms507ms679ms333ms
Last 25%99,470–132,85932,830283ms381ms435ms550ms288ms

Methodology

What we test

Each provider gets one attempt to hold 100,000 sandboxes concurrently from a cold start. The one variable that we successfully measured (without confounding variables) is that each provider successfully held 100,000 live sandboxes concurrently during the test. Some may have failures caused by quota/rate limits (not a true failure) or by us sending more than 100k create() requests.

Execution

One thousand coordinator processes run in parallel — each on its own Namespace VM, each responsible for up 140 sandboxes. All 100,000 sandbox.create() calls fire simultaneously; there is no stagger or ramp. Sandboxes are numbered (0-indexed) in submission order so ordering effects can be measured.

Lifecycle

Each sandbox moves through four steps: allocate (create returns), readiness probe (node -v inside it — failure → readiness_failed), alive hold (all 100,000 sandboxes held simultaneously while fleet-wide peak concurrency is recorded), then liveness check (a second node -v — pass → success, fail → partial). The Invitational score is the fleet-wide peak concurrent count at the hold.

Measurement

Timing is recorded by the coordinator at the moment each API call returns or errors — it does not include client-side retry logic.

Definitions

Benchmark phases

Allocate
— time from sandbox.create() until the API confirms the sandbox exists.
Readiness
— time from sandbox creation until node -v executes and returns successfully inside it.
Alive
— in concurrency charts, the window a sandbox is held open after its initial readiness probe until teardown. The second liveness check runs during this window — a sandbox that fails it appears alive in the chart but is counted as partial.
Worker point
— in the latency scatter chart, each point represents the median latency of a group of sandboxes from one worker, ordered by submission index. The group size scales so roughly 1,000 points appear regardless of run size.

VM metrics

Each VM managed up to 140 sandboxes. One thousand ran in parallel to produce the full 100,000+ sandbox burst. System metrics shown here reflect a single coordinator VM's resource usage — all 1,000 coordinators ran on identical hardware.

RSS
— resident set size: total physical RAM the process occupied, including code, stack, and all heap pages loaded into memory.
Heap used / Heap total
— V8-managed memory for JavaScript objects. Used is live data; total includes GC headroom reserved but not yet filled.
Event loop lag
— delay between when a callback was scheduled and when it actually ran. High p99 lag means the coordinator was CPU-saturated and some requests stalled in the queue.
Open FDs
— open file descriptors: each outbound HTTP connection to the provider API consumes one. Peak FDs reflect peak request concurrency from the coordinator's perspective.
TCP inuse
— TCP connections in an active state (ESTABLISHED, SYN_SENT, etc.) at any moment, read from /proc/net/sockstat.
TCP TIME_WAIT
— connections the OS is holding after close to handle delayed packets. High counts indicate rapid connection churn — the coordinator was cycling through many short-lived connections.
Load avg
— Unix load average: the number of processes runnable or waiting for CPU/IO, averaged over 1 and 5 minutes. On a single-vCPU coordinator VM, values above 1 indicate CPU saturation.
PartnersNamespace