Archive bundle — pack a sweep for handoff¶

A finished sweep is the script, the JSON Lines manifest, and the per-run Parquet outputs. Sweep.archive() packs all three into one .zip — suitable for archival deposit (Zenodo, JOSS supplementary material) or internal handoff. The bundle is self-describing: paths inside the zip are rewritten to bundle-relative form, a MANIFEST.hash carries SHA-256 of every member, and a README.md documents the layout.

This notebook walks through producing a bundle, inspecting its contents, and re-aggregating the per-run DataFrame from the unzipped tree without a re-run.

Prerequisites. A local GMAT install (R2026a is the primary development target; see Supported versions). This notebook does not depend on the [examples] extra (no plots).

Set up the run¶

A small Sat.SMA grid against the leo_short.script fixture — sub-second per run, ten runs total. The bundle is small enough to inspect line-by-line below.

In [1]:

Copied!





import tempfile
import zipfile
from pathlib import Path

from gmat_run import locate_gmat

from gmat_sweep import LocalJoblibPool, Manifest, Sweep, lazy_multiindex, sweep

install = locate_gmat()
script_path = Path("leo_short.script").resolve()

print(f"GMAT version: {install.version}")
print(f"Script:       {script_path.name}")
print(f"Exists:       {script_path.exists()}")
import tempfile
import zipfile
from pathlib import Path

from gmat_run import locate_gmat

from gmat_sweep import LocalJoblibPool, Manifest, Sweep, lazy_multiindex, sweep

install = locate_gmat()
script_path = Path("leo_short.script").resolve()

print(f"GMAT version: {install.version}")
print(f"Script:       {script_path.name}")
print(f"Exists:       {script_path.exists()}")

GMAT version: R2026a
Script:       leo_short.script
Exists:       True

Run the sweep¶

Pass out= so the per-run Parquet files and the JSON Lines manifest survive past the call — the bundle reads from that directory. The default out=None would tie everything to the DataFrame's lifetime, so the archive call would have nothing to read.

In [2]:

Copied!





tmpdir = tempfile.TemporaryDirectory(prefix="archive-bundle-")
out_dir = Path(tmpdir.name)

df = sweep(
    script_path,
    grid={"Sat.SMA": [7000, 7100, 7200, 7300, 7400, 7500, 7600, 7700, 7800, 7900]},
    out=out_dir,
    progress=False,
)
df["__status"].value_counts()
tmpdir = tempfile.TemporaryDirectory(prefix="archive-bundle-")
out_dir = Path(tmpdir.name)

df = sweep(
    script_path,
    grid={"Sat.SMA": [7000, 7100, 7200, 7300, 7400, 7500, 7600, 7700, 7800, 7900]},
    out=out_dir,
    progress=False,
)
df["__status"].value_counts()

Out[2]:

__status
ok    30
Name: count, dtype: int64

Pack the bundle¶

Sweep.from_manifest reconstructs a Sweep object from the manifest on disk; Sweep.archive writes a .zip next to it. include_logs=False (the default) drops the per-run worker.log files to keep the bundle small; flip to True for a forensic-grade bundle that retains every log.

In [3]:

Copied!





with LocalJoblibPool(max_workers=1) as pool:
    sweep_obj = Sweep.from_manifest(out_dir / "manifest.jsonl", script_path, backend=pool)
    bundle_path = sweep_obj.archive(out=out_dir / "sweep_bundle.zip")

print(f"Bundle:       {bundle_path.name}")
print(f"Bundle size:  {bundle_path.stat().st_size:,} bytes")
with LocalJoblibPool(max_workers=1) as pool:
    sweep_obj = Sweep.from_manifest(out_dir / "manifest.jsonl", script_path, backend=pool)
    bundle_path = sweep_obj.archive(out=out_dir / "sweep_bundle.zip")

print(f"Bundle:       {bundle_path.name}")
print(f"Bundle size:  {bundle_path.stat().st_size:,} bytes")

Bundle:       sweep_bundle.zip
Bundle size:  20,497 bytes

Inspect the layout¶

The bundle's top-level entries are deliberately stable so a downstream consumer can find the script, the manifest, and the per-run Parquet files without negotiating a per-bundle convention. The accompanying README.md documents the layout; MANIFEST.hash carries SHA-256 of every member so a corrupted member is detectable on extract.

In [4]:

Copied!





with zipfile.ZipFile(bundle_path) as zf:
    members = sorted(zf.namelist())

print(f"Total members: {len(members)}")
for member in members[:12]:
    print(f"  {member}")
if len(members) > 12:
    print(f"  ... ({len(members) - 12} more)")
with zipfile.ZipFile(bundle_path) as zf:
    members = sorted(zf.namelist())

print(f"Total members: {len(members)}")
for member in members[:12]:
    print(f"  {member}")
if len(members) > 12:
    print(f"  ... ({len(members) - 12} more)")

Total members: 14
  MANIFEST.hash
  README.md
  manifest.jsonl
  runs/run-0/report__RF.parquet
  runs/run-1/report__RF.parquet
  runs/run-2/report__RF.parquet
  runs/run-3/report__RF.parquet
  runs/run-4/report__RF.parquet
  runs/run-5/report__RF.parquet
  runs/run-6/report__RF.parquet
  runs/run-7/report__RF.parquet
  runs/run-8/report__RF.parquet
  ... (2 more)

Re-aggregate from the unzipped bundle¶

Extract the bundle to a fresh directory, reload the bundled manifest, and pass it back through lazy_multiindex to rebuild the per-run DataFrame without re-running anything. The aggregated frame is bit-equal to df — every per-run Parquet is copied verbatim into the bundle.

In [5]:

Copied!





extract_dir = Path(tmpdir.name) / "extracted"
extract_dir.mkdir(exist_ok=True)

with zipfile.ZipFile(bundle_path) as zf:
    zf.extractall(extract_dir)

bundled_manifest = Manifest.load(extract_dir / "manifest.jsonl")
df_from_bundle = lazy_multiindex(bundled_manifest, extract_dir)

print(f"Original frame:    {df.shape}")
print(f"Bundle frame:      {df_from_bundle.shape}")
original_ids = sorted(df.index.unique("run_id").tolist())
bundle_ids = sorted(df_from_bundle.index.unique("run_id").tolist())
print(f"Run IDs match:     {original_ids == bundle_ids}")
extract_dir = Path(tmpdir.name) / "extracted"
extract_dir.mkdir(exist_ok=True)

with zipfile.ZipFile(bundle_path) as zf:
    zf.extractall(extract_dir)

bundled_manifest = Manifest.load(extract_dir / "manifest.jsonl")
df_from_bundle = lazy_multiindex(bundled_manifest, extract_dir)

print(f"Original frame:    {df.shape}")
print(f"Bundle frame:      {df_from_bundle.shape}")
original_ids = sorted(df.index.unique("run_id").tolist())
bundle_ids = sorted(df_from_bundle.index.unique("run_id").tolist())
print(f"Run IDs match:     {original_ids == bundle_ids}")

Original frame:    (30, 5)
Bundle frame:      (30, 5)
Run IDs match:     True

Where to next¶

Manifest schema. Manifest schema documents every field the JSON Lines records carry, including the output_dir rewrite rule the bundler applies.
Resume from a partial sweep. Notebook 03 walks through partial-manifest recovery — the same Sweep.from_manifest entry point used here.
Bundling with logs for forensics. Pass include_logs=True to retain every per-run worker.log. The bundled manifest's log_path keeps pointing at the bundled file, so a downstream investigator can correlate run outcomes with their log lines without re-running anything.