Per-run postprocessing hooks¶
A sweep often needs more than the Parquet files GMAT itself writes. A
mission might propagate a trajectory, and the analysis you actually want
is a derived quantity per run — a comparison against truth, a summary
statistic, a custom table. gmat-sweep lets you register a postprocess
hook: a function that runs in the worker after each successful GMAT
run, writes its own artefacts, and registers them so the manifest stays
the single source of truth for everything a sweep produced.
Registering a hook¶
Pass postprocess= to sweep,
monte_carlo, or
latin_hypercube. The value is an
import path string — "package.module:function" — not the function
object itself:
from gmat_sweep import sweep
reports = sweep(
"mission.script",
grid={"Sat.SMA": [7000, 7100, 7200]},
out="./sweep",
postprocess="my_analysis.hooks:summarise_run",
)
It is a string rather than a callable because the run spec is
JSON-serialised on its way to every backend's worker — a bare function
cannot survive that round trip. The named function must therefore be
importable: a module-level def, not a closure or lambda. The path is
resolved once in the driver when the sweep starts, so a typo fails
immediately with a SweepConfigError
rather than once per run.
Writing the hook¶
The hook is called as hook(run_spec, run_outcome) and returns a mapping
of string keys to the filesystem paths it wrote:
# my_analysis/hooks.py
from pathlib import Path
import pandas as pd
from gmat_sweep import RunOutcome, RunSpec
def summarise_run(spec: RunSpec, outcome: RunOutcome) -> dict[str, Path]:
"""Reduce the run's ReportFile to one summary row."""
report = pd.read_parquet(outcome.output_paths["report__FinalState"])
last = report.sort_values("time").iloc[-1]
out_path = spec.output_dir / "summary.parquet"
pd.DataFrame(
[{"run_id": spec.run_id, "final_radius_km": last["Sat.RMAG"]}]
).to_parquet(out_path)
return {"summary": out_path}
run_speccarriesrun_id,overrides, andoutput_dir— write artefacts underoutput_dirso they land in the run's own directory.run_outcomeis the GMAT step'sRunOutcome;output_pathsmaps to the GMAT Parquet files already on disk.- The returned paths must exist on disk when the hook returns. Each one
is recorded in the manifest entry's
extra_outputsunder the key you gave it.
A hook may write any number of artefacts and return any number of keys.
Returning an empty mapping is fine — the run is still ok.
Carrying per-run data into the hook¶
The hook runs in the worker, so it sees only what travels on the
RunSpec. overrides is the obvious channel — but every key there is
applied to the GMAT Mission and folded into the manifest's
parameter_spec. Data the hook needs but GMAT must not — a reference
state to score the run against, a precomputed truth value, a row of
provenance metadata — has nowhere to go in overrides.
RunSpec.context is that channel: a free-form
mapping that rides to the worker untouched, is never applied to the
Mission, and never enters parameter_spec. The hook reads it as
run_spec.context:
from pathlib import Path
from gmat_sweep import RunSpec, Sweep
specs = [
RunSpec(
script_path=mission,
overrides={"Sat.X": x0[i], "Sat.Y": y0[i], "Sat.Z": z0[i]},
output_dir=out / f"run-{i}",
run_id=i,
seed=None,
run_options={},
context={"truth_km": truth[i].tolist(), "norad_id": ids[i]},
)
for i in range(n)
]
sweep = Sweep(
runs=specs,
backend=pool,
manifest_path=out / "manifest.jsonl",
output_dir=out,
script_path=mission,
postprocess="my_analysis.hooks:compare_to_truth",
)
# my_analysis/hooks.py
def compare_to_truth(run_spec, run_outcome):
truth = run_spec.context["truth_km"]
...
Three things to know about context:
- Values must be JSON-encodable. The spec crosses the worker
boundary as JSON, so encode numpy arrays and timestamps yourself
(
array.tolist(),ts.isoformat()) before attaching them. - It is an explicit-
RunSpecaffordance. Thesweep(),monte_carlo(), andlatin_hypercube()entry points build their specs internally and expose no per-run payload —contextis for theSweep(runs=[...])path where you build the specs yourself. - It is recorded in the manifest, and restored on resume. Each run's
contextlands on its manifest entry, so a run rebuilt bySweep.from_manifest— the resume andmonte_carlo_extendpath — comes back with thecontextit ran with. The exception is a run that never completed before the sweep was interrupted: it has no entry to restore from. Passcontext_provider=tofrom_manifestto recomputecontextfor those runs — see Resume.
When a hook fails¶
A hook that raises makes the run a plain status="failed". The entry
also records postprocess_status="failed" — a separate three-valued
field (none / ok / failed) that captures the hook's own outcome
independently of status. Keeping the postprocess outcome in its own
field has two payoffs:
resumeretries it for free.Sweep.resumere-runs everyfailedrun, so fixing a hook bug and resuming picks the run back up with no special handling. The retry re-runs the whole worker task — GMAT propagation included — not just the hook.- The failure stays diagnosable.
postprocess_status == "failed"separates a hook bug from a GMAT-engine failure (which leavespostprocess_status == "none");stderrcarries the traceback either way.
Aggregating the extra outputs¶
lazy_extra_outputs — or the
Sweep.to_extra_outputs convenience
method — streams the per-run extra Parquets for one key into a single
multi-indexed DataFrame:
from gmat_sweep import lazy_extra_outputs
summary = lazy_extra_outputs("./sweep/manifest.jsonl", "summary")
The first argument is the path to the sweep's manifest.jsonl, the
second the extra-output key to aggregate. name is required — extra
outputs are keyed by the hook's own strings, with no sole "natural"
output to fall back to.
The result index adapts to the per-run frame: one that carries a time
column yields a (run_id, time) MultiIndex — the shape
lazy_multiindex returns — and one without
yields a single-level run_id index, so a hook that writes one row per
run gives one row per run_id.
Runs that produced no such output — every failed and skipped run,
including hook failures, plus any ok run whose hook did not register
the key — appear as a single NaN-filled marker row carrying the run's
__status, so the result always carries a complete row set per run.
Aggregating sweep outputs covers the contract in full.
Resume and extend¶
The hook is recorded on the manifest header, so it travels with the
sweep. Sweep.from_manifest re-applies
it: resumed runs and monte_carlo_extend
runs re-run the same postprocessing without the caller having to pass
postprocess= again.