100k Burst Concurrency Benchmark

Ran 2026-06-18 at 11:07 UTCComplete

Namespace·1 vCPU / 2GB RAM sharded workers

Burst achieved

100%

peak / 100,000

Peak concurrent

100,000+

verified hold

Result

Reached

target reached

Failed

target validation

Target reached

24s

to 100,000 live

Sandbox Spec

0.5 CPU1024 MB

P50 Allocate

321ms

sandbox.create()

P99 Allocate

566ms

tail allocation

P50 Readiness

269ms

first command

P99 Readiness

733ms

tail readiness

Sandbox milestones

Concurrency by phase

Sandbox milestones

Concurrency by phase

Median latency per worker, ordered by submission index.

Loading latency scatter…

Allocate = sandbox.create() · Readiness = node -v after allocate

Phase	P50	P95	P99	Mean	Count
Allocate sandbox.create()	321ms	432ms	566ms	328ms	131,320
Readiness node -v after allocate	269ms	674ms	733ms	330ms	131,320

Latency by submission index quartile — reveals whether the provider favours earlier-queued requests.

Segment	Idx range	Count ok	P50	P95	P99	Max	Mean
First 25%	0–32,829	32,830	342ms	491ms	814ms	1.26s	360ms
Middle 50%	32,830–99,469	65,660	329ms	433ms	507ms	679ms	333ms
Last 25%	99,470–132,859	32,830	283ms	381ms	435ms	550ms	288ms

coordinator logs & system metrics

Per-worker log with level filtering and search, plus VM resource timeline

View →

Methodology

What we test

Each provider gets one attempt to hold 100,000 sandboxes concurrently from a cold start. The one variable that we successfully measured (without confounding variables) is that each provider successfully held 100,000 live sandboxes concurrently during the test. Some may have failures caused by quota/rate limits (not a true failure) or by us sending more than 100k create() requests.

Execution

One thousand coordinator processes run in parallel — each on its own Namespace VM, each responsible for up 140 sandboxes. All 100,000 sandbox.create() calls fire simultaneously; there is no stagger or ramp. Sandboxes are numbered (0-indexed) in submission order so ordering effects can be measured.

Lifecycle

Each sandbox moves through four steps: allocate (create returns), readiness probe (node -v inside it — failure → readiness_failed), alive hold (all 100,000 sandboxes held simultaneously while fleet-wide peak concurrency is recorded), then liveness check (a second node -v — pass → success, fail → partial). The Invitational score is the fleet-wide peak concurrent count at the hold.

Measurement

Timing is recorded by the coordinator at the moment each API call returns or errors — it does not include client-side retry logic.

Definitions

Benchmark phases

Allocate: — time from sandbox.create() until the API confirms the sandbox exists.
Readiness: — time from sandbox creation until node -v executes and returns successfully inside it.
Alive: — in concurrency charts, the window a sandbox is held open after its initial readiness probe until teardown. The second liveness check runs during this window — a sandbox that fails it appears alive in the chart but is counted as partial.
Worker point: — in the latency scatter chart, each point represents the median latency of a group of sandboxes from one worker, ordered by submission index. The group size scales so roughly 1,000 points appear regardless of run size.

VM metrics

Each VM managed up to 140 sandboxes. One thousand ran in parallel to produce the full 100,000+ sandbox burst. System metrics shown here reflect a single coordinator VM's resource usage — all 1,000 coordinators ran on identical hardware.

RSS: — resident set size: total physical RAM the process occupied, including code, stack, and all heap pages loaded into memory.
Heap used / Heap total: — V8-managed memory for JavaScript objects. Used is live data; total includes GC headroom reserved but not yet filled.
Event loop lag: — delay between when a callback was scheduled and when it actually ran. High p99 lag means the coordinator was CPU-saturated and some requests stalled in the queue.
Open FDs: — open file descriptors: each outbound HTTP connection to the provider API consumes one. Peak FDs reflect peak request concurrency from the coordinator's perspective.
TCP inuse: — TCP connections in an active state (ESTABLISHED, SYN_SENT, etc.) at any moment, read from /proc/net/sockstat.
TCP TIME_WAIT: — connections the OS is holding after close to handle delayed packets. High counts indicate rapid connection churn — the coordinator was cycling through many short-lived connections.
Load avg: — Unix load average: the number of processes runnable or waiting for CPU/IO, averaged over 1 and 5 minutes. On a single-vCPU coordinator VM, values above 1 indicate CPU saturation.

100k Burst Concurrency Benchmark

Results

Timeline

Timeline

Latency distribution

Latency by iteration

Latency by iteration

Timing

Submission-order analysis

Logs & VM details

Methodology

Definitions

Latency by iteration

Timeline