Skip to content

Benchmarks

Wall-clock and throughput numbers for the three execution backends on a 1000-run reference sweep, plus the in-CI throughput regression that catches per-PR slowdowns before release.

Setup

The reference sweep runs Sat.SMA swept across np.linspace(7000, 8000, 1000) against the LEO basic mission fixture under tests/data/leo_basic.script — a one-Spacecraft, point-mass, 60-second propagate that exercises the full pipeline (script load, propagator, ReportFile output, Parquet write) without paying for stationkeeping or stock-sample support files.

What Value
Run count 1000
Mission script tests/data/leo_basic.script
Sweep parameter Sat.SMAnp.linspace(7000, 8000, 1000)
Workers per backend 8
GMAT version R2026a
gmat-sweep version 110d4be (post-0.2.0 main)
CPU Intel® Core™ i7-10700 @ 2.90 GHz (8 cores / 16 threads)
RAM 16 GB
OS Linux 6.6.87 (WSL2, x86_64)

The benchmark fixture is committed at tests/data/benchmark_sweep.py; the docs reproduce-command and the CI throughput regression test share that single sweep definition so the docs and CI numbers cannot drift.

Per-backend timings

Wall-clock seconds, median of three runs, min–max range in parentheses.

Backend Median (s) Min (s) Max (s)
LocalJoblibPool(workers=8) 12.10 11.85 12.51
DaskPool(n_workers=8) (LocalCluster) 14.32 14.22 14.73
RayPool(num_cpus=8) (local) 14.61 14.52 14.73

Throughput

Backend Runs/sec Per-worker runs/sec
LocalJoblibPool(workers=8) 82.62 10.33
DaskPool(n_workers=8) 69.85 8.73
RayPool(num_cpus=8) 68.46 8.56

Discussion

On this 8-worker / single-machine setup the local joblib pool turns in roughly 82.6 runs/sec, with Dask at 69.9 and Ray at 68.5 — about 15–17 % below local. Per-worker throughput tracks the same gap (10.3 vs 8.7 vs 8.6 runs/sec). Dask and Ray are within ~2 % of each other; the gap to local is the dispatch-layer overhead each of them pays per task that joblib's loky backend avoids on a single host. Per-run GMAT load is amortised identically across all three backends because reuse_gmat_context=True keeps a single gmatpy import alive in each worker for the lifetime of the sweep.

The picture changes once the sweep spans more than one machine; see Backends for when each backend is worth its overhead.

How to reproduce

The benchmark fixture is invokable as a module — pass --backend and --scale to pick the variant:

# Full 1000-run benchmark on the local joblib backend
python -m tests.data.benchmark_sweep --scale 1000 --backend local --workers 8

# Same on Dask (LocalCluster)
python -m tests.data.benchmark_sweep --scale 1000 --backend dask --workers 8

# Same on Ray (local runtime)
python -m tests.data.benchmark_sweep --scale 1000 --backend ray --workers 8

Each invocation prints a JSON record with wall_seconds and throughput_runs_per_sec on stdout. The 50-run scaled variant is what the CI throughput regression executes:

python -m tests.data.benchmark_sweep --scale 50 --backend local --workers 4

CI regression gate

The backend-throughput CI job runs the 50-run sweep on each of the three backends and asserts a per-backend throughput floor. The floor JSON lives at tests/data/throughput_floor.json; updates show as deliberate diffs in PRs, so a tightening or relaxation is reviewable rather than accidental. A regression below the floor fails CI with a message naming the backend, the measured rate, and the floor.