Leaderboard¶
The public leaderboard runs
the package's shipped deterministic scorer on the frozen v0.3 test split and
ranks your method against the classical, BiLSTM, transformer, and foundation baselines. You upload a
predictions.json; it returns your per-class above-floor recall at the operating point. It is the
hosted, zero-setup way to get a number directly comparable to prior work — the same number the
benchmark protocol defines.
What kind of board this is¶
The v0.3 test labels are part of the public CC-BY-4.0 dataset (dataset/v0.3/labels.json and
splits.json commit the labels and mark the test objects), so the answer key is already published.
The board is therefore a reproducibility / convenience board: it hosts the canonical scorer on the
splits everyone already has, so a new method gets a directly comparable score without standing up the
scorer locally.
It is not a hidden-label competition. The aggregate-only response and the per-user rate limit are courtesy and abuse guards, not integrity guarantees — the labels are public either way. A true hidden-label competition needs a never-committed forward holdout, which the open dataset cannot provide; that is deferred to a later release. See D12 for the rationale.
How to submit¶
1. Reconstruct the v0.3 objects¶
Run your detector over the same objects the test split scores. Reconstruct their element series from
your own Space-Track account — the recipe is pinned in dataset/v0.3/recipe.json, and the
content-hash manifest verifies the rebuild byte-for-byte:
export SPACETRACK_USERNAME=you@example.com
export SPACETRACK_PASSWORD=...
maneuver-detect dataset build --out dataset/
The labels and splits are public; only the raw element history is reconstructed locally (the recipe-first distribution model — see the dataset reference).
2. Produce a predictions.json¶
Run your method and serialise its detections to a JSON array of canonical maneuver
records — exactly what the package's read_predictions parses. The package's own helpers
do this: from_frame turns a detect result into canonical records, and predictions_to_json writes
the file (sorted keys, ISO-8601 epochs):
from pathlib import Path
from maneuver_detect import detect, datasets
from maneuver_detect.benchmark import predictions_to_json
from maneuver_detect.schema import from_frame
maneuvers = []
for norad_id in test_objects: # the test-split objects from splits.json
history = datasets.tle_history(norad_id=norad_id)
maneuvers.extend(from_frame(detect(history))) # your own detector in place of detect()
Path("predictions.json").write_text(predictions_to_json(maneuvers))
Each record carries the schema fields — epoch, confidence, type, delta_v_estimate, and the
provenance (norad_id, elset_epoch_before, elset_epoch_after). A submission can express nothing but
predictions: the fixed-schema reader rejects any other payload, so a submission cannot smuggle a query
for the labels.
3. Upload and read your score¶
On the Space, upload your
predictions.json, enter a method name and your Hugging Face user id, and submit. The board scores it
and adds your row.
What you get back¶
The response is aggregate-only — per class, never per label:
- Above-floor recall and precision at the primary operating point (1 false alarm / satellite-year), the headline metric.
- The published timing-only "cheating floor" — the score a detector reaches from inter-elset timing alone, with no element signal. A method that only learned the sampling cadence lands here; a real detector must beat it.
No per-label match table is ever returned, so the board exposes no label a submission did not already have from the public dataset.
Limits and etiquette¶
- Rate limit: up to 5 scored submissions per Hugging Face user per UTC day, keyed to your user id. It is a courtesy guard against accidental floods, not an integrity control.
- Space-Track terms: you reconstruct the series under your own account; nothing about a submission ships or proxies raw Space-Track data — only your predictions.
Reproduce it locally¶
You do not need the Space to get a comparable number — the same deterministic scorer ships in the
package (maneuver_detect.benchmark) and reproduces the board's numbers byte-for-byte from a predictions
file. The benchmark protocol shows the read_predictions → score call, and the
reproduce-the-baseline example
runs the local scorer end to end on a synthetic labelled series.