Cluster recipes¶
Worked examples for wiring gmat-sweep into shared cluster infrastructure
— one page per orchestrator, each pairing the cluster-side configuration
with the matching sweep() driver.
The recipes document patterns; they don't introduce new APIs. The
underlying DaskPool and
RayPool surface — and the
Pool ABC — are covered on the
Backends page. Reach for a recipe when you've already
decided on the orchestrator and need the wiring that makes a sweep run
on it.
Choosing a recipe¶
| Recipe | Pool | When to pick it |
|---|---|---|
Slurm with srun |
DaskPool via dask-jobqueue |
An HPC site with a Slurm scheduler; you submit one driver job and let SLURMCluster request worker tasks elastically. |
| Kubernetes pod-per-worker | DaskPool via dask-kubernetes |
A Kubernetes cluster (managed or self-hosted) where each worker is a Pod. Best paired with the Dask Operator. |
| Ray autoscaling | RayPool via ray up |
A Ray cluster — local, on-prem, or cloud — with autoscaling between a head node and an elastic worker pool. |
Each recipe assumes you've followed Getting started
locally first. The local sweep proves your script and grid are sound;
the recipe then lifts the same call onto cluster workers without
changing the sweep() invocation itself — only the backend= argument
changes.
Prerequisites that apply across all three¶
- A working GMAT install reachable on every worker node, not just
the driver. The discovery is
gmat-run's job; misconfigured workers surface as every run failing with the same import error. - A shared output directory at the same path on every worker. Per-run Parquet files and the manifest live there; node-local scratch only works if you stage results back yourself.
- The matching cluster-orchestrator package installed in the same env
the workers run from (
dask-jobqueue,dask-kubernetes, orray). None of these aregmat-sweepdependencies — pick whichever your infrastructure uses and install it explicitly.
When none of these fits¶
The three orchestrators above are the ones with one-shot recipes. For
anything else — AWS Batch, GCP Batch, custom MPI launchers, in-house
schedulers — write a custom Pool against the
Pool ABC. Its contract is small: accept
RunSpecs, route each through the per-task
subprocess hop, and yield RunOutcomes as
they complete. The three shipped pools are exactly that pattern, three
different ways.