API reference¶
The public surface of maneuver-detect. Everything under maneuver_detect documented here is part of the
frozen library contract; modules not listed are internal and may change between releases.
Top-level surface¶
maneuver_detect
¶
maneuver-detect — detect orbital maneuvers from public TLE history.
The public surface is :func:detect and the :mod:~maneuver_detect.datasets accessor: hand
detect a per-object mean-element TLE history and it returns the canonical maneuver DataFrame
(see :mod:maneuver_detect.schema) — each row a detected maneuver with a detection epoch, a
calibrated confidence, a maneuver type, and a Δv estimate. Detectors implement the
:class:~maneuver_detect.detectors.Detector interface and register under a name; detect
dispatches on that name.
Detector
¶
Bases: ABC
Abstract base class for maneuver detectors.
A detector consumes a per-object mean-element TLE history and returns the canonical maneuver
DataFrame (see :mod:maneuver_detect.schema), so the classical reference detector and future
learned detectors are interchangeable. Subclasses set :attr:name — the key
:func:maneuver_detect.detect dispatches on — and implement :meth:detect.
detect
abstractmethod
¶
Detect maneuvers in history and return the canonical maneuver DataFrame.
Maneuver
dataclass
¶
Maneuver(
epoch: Timestamp,
confidence: float,
type: ManeuverType,
delta_v_estimate: float | None,
norad_id: int,
elset_epoch_before: Timestamp,
elset_epoch_after: Timestamp,
)
A single detected maneuver — one row of the canonical DataFrame.
Attributes:
| Name | Type | Description |
|---|---|---|
epoch |
Timestamp
|
Detection epoch (timezone-aware UTC). |
confidence |
float
|
Calibrated detection confidence in |
type |
ManeuverType
|
The maneuver type (:class: |
delta_v_estimate |
float | None
|
Estimated |
norad_id |
int
|
NORAD catalogue id of the object. |
elset_epoch_before |
Timestamp
|
Epoch of the elset bounding the start of the inter-elset gap that brackets the maneuver (timezone-aware UTC). |
elset_epoch_after |
Timestamp
|
Epoch of the elset bounding the end of that gap (timezone-aware UTC). |
ManeuverType
¶
Bases: str, Enum
The maneuver type, attributed from the dominant element change (D5).
detect
¶
Detect maneuvers in a per-object mean-element TLE history.
Dispatches to the named detector and returns the canonical maneuver DataFrame (epoch,
confidence, type, delta_v_estimate, plus provenance — see
:mod:maneuver_detect.schema). The classical reference detector is the default; learned
models are selected by name. Raises :class:ValueError for an unknown model — see
:func:~maneuver_detect.detectors.available_models for the registered names.
available_models
¶
Return the sorted names of all registered detectors.
Datasets¶
datasets
¶
Dataset accessors and the reconstructable labelled dataset.
tle_history is the per-object accessor — the cleaned mean-element series for one NORAD id. The
recipe / manifest / reconstruction surface assembles the full labelled dataset from a
pinned :class:Recipe (D2): each series is re-fetched and re-derived locally, then verified
byte-for-byte against a content-hash :class:Manifest (D8). The raw catalogue data is never shipped
— only the recipe parameters, the open labels, and the per-series digests. The benchmark release
adds the labelled train / val / test splits on top of this.
tle_history
¶
tle_history(
norad_id: int,
*,
start: str | None = None,
end: str | None = None,
source: str = DEFAULT_SOURCE,
) -> DataFrame
Return the cleaned mean-element TLE history for norad_id as a DataFrame.
start and end bound the epoch range (ISO-8601); when omitted, the full available
history is returned. source selects the catalogue ("spacetrack" for the credentialled
gp_history archive — the default and the only source with multi-epoch history — or
"celestrak" for the no-auth current GP elset); an unknown source raises
:class:ValueError. Fetching, caching, and cleaning live in the data layer: the returned frame
carries the canonical :data:~maneuver_detect.data.history.MEAN_ELEMENT_COLUMNS, the same shape
the detector consumes.
Raises :class:~maneuver_detect.errors.MissingCredentialError when the Space-Track source is
used without credentials, and :class:~maneuver_detect.errors.DataSourceError when the source
is unreachable with nothing cached to fall back on.
Output schema¶
schema
¶
The canonical maneuver schema — the frozen library contract.
A detected maneuver is one row of the canonical DataFrame that :func:maneuver_detect.detect
returns and the benchmark scores against. This module is the single source of truth for that
schema: the per-maneuver :class:Maneuver record, the canonical column set and dtypes
(:data:COLUMNS), and the lossless :func:to_frame / :func:from_frame serialisation the
detectors and the scorer share.
The columns, in order, are epoch (UTC detection epoch), confidence (calibrated, [0, 1]),
type (in-track / cross-track / radial), delta_v_estimate (m/s, NaN when not reported),
and the provenance norad_id, elset_epoch_before, elset_epoch_after (the bounding elset
epochs of the inter-elset gap the detection brackets).
ManeuverType
¶
Bases: str, Enum
The maneuver type, attributed from the dominant element change (D5).
Maneuver
dataclass
¶
Maneuver(
epoch: Timestamp,
confidence: float,
type: ManeuverType,
delta_v_estimate: float | None,
norad_id: int,
elset_epoch_before: Timestamp,
elset_epoch_after: Timestamp,
)
A single detected maneuver — one row of the canonical DataFrame.
Attributes:
| Name | Type | Description |
|---|---|---|
epoch |
Timestamp
|
Detection epoch (timezone-aware UTC). |
confidence |
float
|
Calibrated detection confidence in |
type |
ManeuverType
|
The maneuver type (:class: |
delta_v_estimate |
float | None
|
Estimated |
norad_id |
int
|
NORAD catalogue id of the object. |
elset_epoch_before |
Timestamp
|
Epoch of the elset bounding the start of the inter-elset gap that brackets the maneuver (timezone-aware UTC). |
elset_epoch_after |
Timestamp
|
Epoch of the elset bounding the end of that gap (timezone-aware UTC). |
to_frame
¶
to_frame(maneuvers: Sequence[Maneuver]) -> DataFrame
Serialise maneuvers to the canonical DataFrame (canonical column order and dtypes).
An empty sequence yields an empty frame that still carries the full schema, so a detector that finds nothing returns the same shape as one that finds something.
empty_frame
¶
Return an empty canonical frame carrying the full schema and dtypes.
from_frame
¶
from_frame(frame: DataFrame) -> list[Maneuver]
Deserialise a canonical DataFrame back to :class:Maneuver records.
The inverse of :func:to_frame: NaN delta_v_estimate values become None. Raises
:class:ValueError if frame is missing canonical columns.
validate_frame
¶
Validate that frame carries the canonical columns; raise :class:ValueError if not.
Δv inversion (physics)¶
physics
¶
The Δv inversion — turning a detected mean-element jump into a maneuver type and a Δv estimate.
A maneuver detector sees a satellite's orbit only through its SGP4 mean elements: a step in the
semi-major axis, eccentricity, inclination, and node across the inter-elset gap that brackets a
burn. This module is the physics that reads a Δv back out of that step. It implements the impulsive
form of the Gauss variational equations — the exact first-order relation between an impulsive
Δv (decomposed into radial / in-track / cross-track, the RSW frame) and the resulting element
change — both forward (:func:gauss_forward) and inverse (:func:invert):
- in-track Δv shows up as a step in semi-major axis (vis-viva) and eccentricity;
- cross-track Δv shows up as a step in inclination and node, in closed form;
- radial Δv shows up as an eccentricity-vector change beyond what the in-track burn explains — weakly observable, so it is treated as low-confidence by default.
The maneuver type is the dominant component (:func:classify_type), and the magnitude of the
combined impulse is the Δv estimate. Two physical facts shape the implementation, both found in
the V4 spike and frozen as design decision D5:
- Secular drift must be detrended first. The natural J2 nodal regression of the node is several
degrees per day in LEO — far larger than any station-keeping burn — so a raw element difference
reads as a huge spurious cross-track Δv. :func:
local_stepremoves it with a model-free, two-sided local-linear fit; :func:j2_secular_ratesis the analytic drift it cancels. - There is a per-class detectability floor. Below ~cm/s (LEO) / ~0.1 m/s (GEO) the element step
is buried in TLE noise and neither the Δv nor the type is recoverable; above it the inversion is
good to about ±25% (D5). :func:
is_above_floorand :meth:Inversion.delta_v_estimategate the estimate against that floor, so nothing is reported where it cannot be trusted.
The quantitative accuracy of the recovered Δv against published burn magnitudes is validated downstream against the DORIS/IDS Δv ground truth; here the contract is method correctness — the forward/inverse pair round-trips, the type rule is right above the floor, and the magnitudes match the textbook impulsive-maneuver relations.
Orbit
dataclass
¶
Orbit(
semi_major_axis_km: float,
eccentricity: float,
inclination_rad: float,
arg_perigee_rad: float = 0.0,
)
The reference (pre-maneuver) mean orbit the inversion linearises about.
Only the elements the Gauss relations need: the in-plane size/shape (semi_major_axis_km,
eccentricity), the inclination, and the argument of perigee that maps a burn's argument of
latitude to its true anomaly. The node and anomaly do not enter a first-order impulsive
inversion, so they are omitted.
Attributes:
| Name | Type | Description |
|---|---|---|
semi_major_axis_km |
float
|
Semi-major axis, km. |
eccentricity |
float
|
Eccentricity (dimensionless, |
inclination_rad |
float
|
Inclination, radians. |
arg_perigee_rad |
float
|
Argument of perigee, radians (defaults to 0, the only value that matters for a circular orbit, where perigee is undefined). |
ElementStep
dataclass
¶
ElementStep(
delta_a_km: float,
delta_eccentricity: float,
delta_inclination_rad: float,
delta_raan_rad: float,
)
The detrended anomalous step in the mean elements across a maneuver.
The four mean-element changes a TLE detector can read reliably across the inter-elset gap, with
natural secular drift already removed (see :func:local_step). The argument of perigee is
omitted on purpose: it is ill-determined for the near-circular orbits in scope and contributes
no robust signal.
Attributes:
| Name | Type | Description |
|---|---|---|
delta_a_km |
float
|
Change in semi-major axis, km. |
delta_eccentricity |
float
|
Change in eccentricity (dimensionless). |
delta_inclination_rad |
float
|
Change in inclination, radians. |
delta_raan_rad |
float
|
Change in right ascension of the ascending node, radians. |
Inversion
dataclass
¶
Inversion(
delta_v_ms: float,
radial_ms: float,
in_track_ms: float,
cross_track_ms: float,
maneuver_type: ManeuverType,
)
A recovered impulsive maneuver — the RSW Δv decomposition, the total, and the type.
The cross-track and radial components are stored as magnitudes: their sign is not observable from a mean-element step without knowing where in the orbit the burn occurred. The in-track component keeps its sign — positive raises the orbit (a prograde burn), negative lowers it — because that is fixed by the sign of the semi-major-axis step.
Attributes:
| Name | Type | Description |
|---|---|---|
delta_v_ms |
float
|
Total impulse magnitude |
radial_ms |
float
|
Radial component magnitude, m/s (low-confidence; weakly observable). |
in_track_ms |
float
|
In-track (transverse) component, m/s, signed. |
cross_track_ms |
float
|
Cross-track (normal) component magnitude, m/s. |
maneuver_type |
ManeuverType
|
The dominant-component type (:func: |
radial_dominant
property
¶
Whether the maneuver is radial-dominated — low-confidence by default (D5).
is_above_floor
¶
Whether delta_v_ms clears the per-object detectability floor floor_ms (m/s).
delta_v_estimate
¶
The reportable Δv (m/s), or None below floor_ms.
Maps straight onto the schema's optional delta_v_estimate column: D5 reports a Δv only
above the floor, so a below-floor inversion yields None rather than a noise figure.
semi_major_axis_km
¶
Semi-major axis (km) from SGP4 mean motion (revolutions per day), via Kepler's third law.
mean_motion_rad_s
¶
Mean motion n = sqrt(μ / a³) (rad/s) from the semi-major axis (km).
circular_speed_km_s
¶
Circular orbital speed sqrt(μ / a) (km/s) at semi-major axis a (km).
orbital_speed_km_s
¶
orbital_speed_km_s(
orbit: Orbit, true_anomaly_rad: float
) -> float
Orbital speed (km/s) at true_anomaly_rad from vis-viva v² = μ(2/r - 1/a).
gauss_forward
¶
gauss_forward(
*,
radial_ms: float,
in_track_ms: float,
cross_track_ms: float,
orbit: Orbit,
true_anomaly_rad: float,
) -> ElementStep
The forward Gauss VOP — element step produced by an impulsive Δv applied at true_anomaly.
The exact first-order (impulsive) Gauss variational equations for (Δa, Δe, Δi, ΔΩ) given the
RSW components of the impulse, evaluated at the burn true anomaly. This is the model the
inversion inverts and the generator the round-trip tests drive; it makes no circular-orbit
approximation.
invert
¶
invert(
step: ElementStep,
orbit: Orbit,
*,
true_anomaly_rad: float | None = None,
) -> Inversion
Recover the impulsive Δv (RSW components, total, type) from a detrended step.
The cross-track component comes from (Δi, ΔΩ) in closed form, and the burn argument of
latitude — hence the true anomaly — from their ratio. The in-plane components come from
(Δa, Δe): when the burn true anomaly is known (passed in, or recovered from a cross-track
signal) and the resulting 2x2 system is well-conditioned, it is solved exactly; otherwise the
burn location is unobservable, and the inversion falls back to the V4-validated, location-free
estimator — vis-viva for the in-track component from Δa, and the residual
eccentricity-vector kick for the (low-confidence) radial component.
Pass true_anomaly_rad when the burn location is known (e.g. validating against the forward
model); leave it None for the realistic TLE case, where it is inferred or marginalised.
classify_type
¶
classify_type(
*,
radial_ms: float,
in_track_ms: float,
cross_track_ms: float,
) -> ManeuverType
Attribute the maneuver type to the dominant Δv component (D5).
Ties resolve in-track → cross-track → radial, the order of decreasing observability, so a coin-flip never lands on the least-trustworthy class.
j2_secular_rates
¶
j2_secular_rates(
orbit: Orbit,
) -> tuple[float, float, float]
The J2 secular rates (Ω̇, ω̇, Ṁ) of node, perigee, and mean anomaly (rad/s).
The dominant natural drift of a mean orbit: the node regresses, the apsides rotate, and the
mean anomaly drifts, all secularly under Earth oblateness. This is the trend
:func:local_step removes before an inversion — quoted here so a caller can predict or
cross-check it. The node rate vanishes at the poles, the perigee rate at the critical
inclination (≈63.4°), and the mean-anomaly rate where 3cos²i = 1.
local_step
¶
local_step(
times: Sequence[float],
values: Sequence[float],
gap_index: int,
*,
window: int = 4,
) -> float
The detrended step in values across the gap before gap_index, removing secular drift.
A two-sided local-linear fit: a straight line is fit to the window samples on each side of
the gap and both are evaluated at the gap midpoint; their difference is the anomalous step with
the local secular trend (J2 nodal regression and the rest) subtracted out. gap_index is the
index of the first sample after the gap, so the gap spans [gap_index - 1, gap_index].
Without this detrending a maneuver-free node drift of degrees per day reads as a large spurious
cross-track Δv — the V4 failure mode.
Raises:
| Type | Description |
|---|---|
ValueError
|
if |
orbit_class_of
¶
The coarse runtime orbit class of a representative semi-major axis (km).
A single seam both the detector (selecting the nominal Δv floor) and the feature layer
(selecting per-class normalisation statistics) read, so the class boundaries are defined once.
The cuts are :data:ORBIT_CLASS_LEO_MAX_A_KM and :data:ORBIT_CLASS_GEO_MIN_A_KM.
This returns only LEO / MEO / GEO — semi-major axis alone cannot distinguish the
eccentric classes (IGSO is geosynchronous so it lands in GEO here; a high-e HEO
object lands wherever its a falls). The benchmark's per-class scoring uses the pinned
class from the dataset recipe, not this runtime classifier, so IGSO / HEO are still
scored as themselves; this seam only picks the detector's working floor / normalisation, where
treating them as the nearest coarse class is an accepted first-pass approximation.
detectability_floor_ms
¶
The nominal per-class detectability floor (m/s); see :data:DETECTABILITY_FLOOR_MS.
is_above_floor
¶
Whether a Δv (m/s) clears the nominal per-class detectability floor for orbit_class.
Detectors¶
detectors
¶
Maneuver detectors — one module per detector, behind a common interface and registry.
Every detector consumes a per-object mean-element series and returns the canonical maneuver
schema. The classical reference detector (Holt-Winters smoothing + rule-based jump detection +
the Δv inversion) is the baseline every learned model must beat; the learned baselines arrive on
top of the same interface. Detectors register themselves under a name with
:func:register_detector, and :func:maneuver_detect.detect dispatches on that name.
Detector
¶
Bases: ABC
Abstract base class for maneuver detectors.
A detector consumes a per-object mean-element TLE history and returns the canonical maneuver
DataFrame (see :mod:maneuver_detect.schema), so the classical reference detector and future
learned detectors are interchangeable. Subclasses set :attr:name — the key
:func:maneuver_detect.detect dispatches on — and implement :meth:detect.
detect
abstractmethod
¶
Detect maneuvers in history and return the canonical maneuver DataFrame.
BiLstmDetector
¶
BiLstmDetector(
checkpoint: ModelBundle | str | Path | None = None,
*,
threshold: float | None = None,
class_thresholds: dict[str, float] | None = None,
)
Bases: _LearnedDetector
Learned BiLSTM detector — per-gap localisation by the model, Δv/type by the physics.
Construct with a trained checkpoint (a :class:~maneuver_detect.models.checkpoint.ModelBundle
or a path to one); the no-argument construction the registry uses falls back to the
:data:CHECKPOINT_ENV path, and raises from :meth:detect if neither is available.
threshold overrides the bundle's per-gap threshold with one gate for every class, and
class_thresholds overrides its per-class gates. All inference machinery is inherited from
the shared :class:~maneuver_detect.detectors.learned._LearnedDetector.
ClassicalDetector
¶
ClassicalDetector(
*,
window: int = 4,
threshold: float = 6.0,
smoothing_level: float = 0.5,
smoothing_trend: float = 0.1,
radial_confidence_factor: float = 0.6,
regularize_daily: bool = True,
persistence_revert_fraction: float = 0.5,
)
Bases: Detector
Rule-based reference detector: Holt smoothing, residual-jump detection, and Gauss inversion.
Consumes a per-object mean-element series (the
:data:~maneuver_detect.data.history.MEAN_ELEMENT_COLUMNS frame) and returns the canonical
maneuver DataFrame. A frame carrying more than one norad_id is processed object by object,
so the detector is correct on a single-object series and on a concatenated multi-object one.
The tunables are constructor arguments with literature-reasonable defaults; the detectability
floor that gates the Δv estimate is calibrated per object and maneuver type from the element
noise (:meth:floor_for). The default no-argument construction is what the registry
instantiates.
Configure the detector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
window
|
int
|
Samples per side for the two-sided local-linear step fit and the smoothing warm-up; a gap needs at least this many elsets on each side to be scored. |
4
|
threshold
|
float
|
Residual-jump threshold in robust noise scales — a gap is a candidate when a
detrended element step exceeds |
6.0
|
smoothing_level
|
float
|
Holt level smoothing factor (alpha) in |
0.5
|
smoothing_trend
|
float
|
Holt trend smoothing factor (beta) in |
0.1
|
radial_confidence_factor
|
float
|
Multiplier applied to the confidence of a radial-dominated detection (D5: radial maneuvers are weakly observable and reported low-confidence). |
0.6
|
regularize_daily
|
bool
|
Collapse the series to one representative elset per UTC day before detection. Real catalogues fit several elsets per day in bursts; left raw, each extra gap is another chance to fire, so the dense cadence inflates the false-alarm rate. The D4 matching tolerance (the bracketing gap plus or minus one, about two days) absorbs the small epoch shift the binning introduces. |
True
|
persistence_revert_fraction
|
float
|
A candidate is rejected as a transient (a single bad elset or a same-epoch re-fit, not a maneuver) when its dominant-element step reverses on an adjacent gap with at least this fraction of its magnitude — a real maneuver is a sustained step, not a spike that returns. |
0.5
|
detect
¶
Detect maneuvers in history and return the canonical maneuver DataFrame.
history is a mean-element series (it must carry :data:_REQUIRED_COLUMNS). An empty or
too-short series yields an empty canonical frame. A frame with multiple objects is grouped
by norad_id and each object detected independently; the rows are returned sorted by
(norad_id, epoch).
floor_for
¶
floor_for(history: DataFrame) -> dict[ManeuverType, float]
The per-type Δv detectability floor (m/s) for a single-object history.
The Δv below which a maneuver of each type cannot be told from this object's TLE noise — the data-derived, TLE-quality-dependent floor D4 calls for, bounded below by the nominal per-class floor. The benchmark uses the floor for a label's type to decide whether it is in the above-floor population scored for recall, and the detector uses the floor for a detection's type to gate the reported Δv (D5). It is computed on the same regularised series the detector sees, so the two agree. Falls back to the nominal class floor when the series is too short to calibrate.
ChronosResidualDetector
¶
ChronosResidualDetector(
bundle: FoundationBundle | str | Path | None = None,
*,
forecaster: Forecaster | None = None,
class_thresholds: dict[str, float] | None = None,
threshold: float | None = None,
calibrator: Calibrator | None = None,
)
Bases: _ForecastResidualDetector
Chronos forecast-residual detector — the v0.3 foundation baseline (D14.4).
Chronos brings a broad pretrained prior over time-series shape and is robust on real, noisy
element series. All the inference machinery is the shared forecaster-agnostic pipeline in
:class:_ForecastResidualDetector; this class only pins the registry name, the checkpoint
environment variable, and the "chronos" backend.
TransformerDetector
¶
TransformerDetector(
checkpoint: ModelBundle | str | Path | None = None,
*,
threshold: float | None = None,
class_thresholds: dict[str, float] | None = None,
)
Bases: _LearnedDetector
Learned transformer detector — per-gap localisation by the model, Δv/type by the physics.
Construct with a trained checkpoint (a :class:~maneuver_detect.models.checkpoint.ModelBundle
or a path to one); the no-argument construction the registry uses falls back to the
:data:CHECKPOINT_ENV path, and raises from :meth:detect if neither is available.
threshold overrides the bundle's per-gap threshold with one gate for every class, and
class_thresholds overrides its per-class gates. All inference machinery is inherited from
the shared :class:~maneuver_detect.detectors.learned._LearnedDetector.
register_detector
¶
Register a :class:Detector subclass under its :attr:~Detector.name for dispatch.
Usable as a class decorator. Raises :class:ValueError if a different detector is already
registered under the same name.
get_detector
¶
get_detector(model: str) -> Detector
Instantiate the registered detector named model.
Raises :class:ValueError if no detector is registered under that name, listing the names
that are available.
available_models
¶
Return the sorted names of all registered detectors.
Benchmark¶
benchmark
¶
The frozen benchmark — splits, matching rule, metrics, and the scorer.
Leak-free splits by satellite and time window (seeded and byte-stable), the detection-matching rule, the metric (precision and recall at a fixed false-alarm rate per satellite class, with per-class type confusion), and the deterministic scorer the leaderboard runs. Frozen by release.
DetectionMatch
dataclass
¶
DetectionMatch(
detection: Maneuver, label: ScoredLabel | None
)
One detection and the label it was assigned, or None when it matched nothing.
Attributes:
| Name | Type | Description |
|---|---|---|
detection |
Maneuver
|
The predicted maneuver. |
label |
ScoredLabel | None
|
The :class: |
Matching
dataclass
¶
Matching(
matches: tuple[DetectionMatch, ...],
unmatched_labels: tuple[ScoredLabel, ...],
)
The one-to-one assignment of detections to labels under the detection-matching rule.
Attributes:
| Name | Type | Description |
|---|---|---|
matches |
tuple[DetectionMatch, ...]
|
One :class: |
unmatched_labels |
tuple[ScoredLabel, ...]
|
The matchable labels no detection claimed — above-floor ones are the false
negatives. Labels with no |
ScoredLabel
dataclass
¶
A held-out label as the benchmark scores it — an interval plus its above-floor status.
Attributes:
| Name | Type | Description |
|---|---|---|
interval |
LabelledInterval
|
The label mapped onto its bracketing inter-elset gap, carrying the D4 matching
window (:class: |
above_floor |
bool
|
Whether the maneuver is above the per-object detectability floor (D4). The
headline metric scores the above-floor population; below-floor labels are physically
undetectable from TLEs and are ignored rather than counted as misses. Defaults to
|
ClassMetrics
dataclass
¶
ClassMetrics(
orbit_class: OrbitClass,
sat_years: float,
n_objects: int,
n_detections: int,
n_labels_above_floor: int,
n_labels_total: int,
operating_point: float,
ci_level: float,
recall: float | None,
recall_ci: tuple[float, float] | None,
precision: float | None,
precision_ci: tuple[float, float] | None,
full_population_recall: float | None,
pr_curve: tuple[PRPoint, ...],
confusion: Confusion,
operating_point_confidence: float | None = None,
)
The benchmark metrics for one orbit class at the headline operating point.
Attributes:
| Name | Type | Description |
|---|---|---|
orbit_class |
OrbitClass
|
The class scored. |
sat_years |
float
|
Satellite-years of observation in the class (the false-alarm-rate denominator). |
n_objects |
int
|
Objects in the class in the scored population. |
n_detections |
int
|
Detections attributed to the class. |
n_labels_above_floor |
int
|
Above-floor labels in the class — the recall denominator. |
n_labels_total |
int
|
All matchable labels in the class (the full-population denominator). |
operating_point |
float
|
The headline false-alarm-per-satellite-year target (D4). |
ci_level |
float
|
The confidence level of |
recall |
float | None
|
Recall over the above-floor population at |
recall_ci |
tuple[float, float] | None
|
The |
precision |
float | None
|
Precision over the above-floor population at |
precision_ci |
tuple[float, float] | None
|
The |
full_population_recall |
float | None
|
Recall counting below-floor recoveries, over all labels — a
secondary lower bound, or |
pr_curve |
tuple[PRPoint, ...]
|
|
confusion |
Confusion
|
Type confusion over the above-floor true positives at |
operating_point_confidence |
float | None
|
The confidence cut at |
Confusion
dataclass
¶
Confusion(
counts: dict[ManeuverType, dict[ManeuverType, int]],
)
Type confusion over above-floor true positives — true label type vs. predicted type.
Attributes:
| Name | Type | Description |
|---|---|---|
counts |
dict[ManeuverType, dict[ManeuverType, int]]
|
|
ObjectExposure
dataclass
¶
One scored object's class and observation span — the unit the false-alarm rate is over.
The scored population is the set of objects the benchmark observed: each contributes its observation span to its class's satellite-year total (the false-alarm-rate denominator) and fixes the orbit class a detection on that object is attributed to.
Attributes:
| Name | Type | Description |
|---|---|---|
norad_id |
int
|
NORAD catalogue id of the object. |
orbit_class |
OrbitClass
|
The object's orbit class. |
observation_years |
float
|
The span of the object's mean-element series, in years. |
PRPoint
dataclass
¶
PRPoint(
fa_per_sat_year: float,
recall: float | None,
precision: float | None,
recall_ci: tuple[float, float] | None,
precision_ci: tuple[float, float] | None,
)
One point of the precision/recall curve at a target false-alarm rate.
Attributes:
| Name | Type | Description |
|---|---|---|
fa_per_sat_year |
float
|
The target false alarms per satellite-year this point is evaluated at. |
recall |
float | None
|
Recall over the above-floor population, or |
precision |
float | None
|
Precision over the above-floor population, or |
recall_ci |
tuple[float, float] | None
|
The |
precision_ci |
tuple[float, float] | None
|
The |
ScoreReport
dataclass
¶
ScoreReport(
operating_point: float,
sweep: tuple[float, ...],
ci_level: float,
per_class: dict[OrbitClass, ClassMetrics],
)
The benchmark score — per-class metrics at the headline operating point, plus the sweep.
Attributes:
| Name | Type | Description |
|---|---|---|
operating_point |
float
|
The headline false-alarm-per-satellite-year target (D4). |
sweep |
tuple[float, ...]
|
The false-alarm-per-satellite-year sweep the P/R curve covers. |
ci_level |
float
|
The confidence level of the per-class recall / precision intervals. |
per_class |
dict[OrbitClass, ClassMetrics]
|
One :class: |
Split
dataclass
¶
Split(
dataset_version: str,
seed: int,
ratios: tuple[float, float, float],
train: frozenset[int],
val: frozenset[int],
test: frozenset[int],
)
A frozen leak-free partition of the labelled objects into train / val / test.
Attributes:
| Name | Type | Description |
|---|---|---|
dataset_version |
str
|
The dataset version the split was computed for (lockstep with manifest). |
seed |
int
|
The seed the split was generated under (orders equal-size components). |
ratios |
tuple[float, float, float]
|
The target |
train |
frozenset[int]
|
NORAD ids assigned to the training split. |
val |
frozenset[int]
|
NORAD ids assigned to the validation split. |
test |
frozenset[int]
|
NORAD ids assigned to the test split. |
name_of
¶
name_of(norad_id: int | None) -> SplitName | None
The split norad_id belongs to, or None if it is unset or in no split.
assign
¶
assign(
labels: Sequence[ManeuverLabel],
) -> dict[SplitName, list[ManeuverLabel]]
Group labels by the split of their object (every split key present, possibly empty).
Labels whose norad_id is None or falls in no split are dropped — they cannot attach
to an object the benchmark holds out.
to_json
¶
Serialise to canonical, NORAD-sorted JSON (a stable, committable artifact).
SplitCounts
dataclass
¶
SplitCounts(
per_split: dict[
SplitName, dict[OrbitClass, _ClassCount]
],
)
Per-split, per-class object and maneuver-event counts (D7's reported figures).
Attributes:
| Name | Type | Description |
|---|---|---|
per_split |
dict[SplitName, dict[OrbitClass, _ClassCount]]
|
|
SplitName
¶
Bases: str, Enum
The three benchmark partitions, in canonical order.
TemporalSplit
dataclass
¶
TemporalSplit(
dataset_version: str,
seed: int,
cut1: datetime,
cut2: datetime,
guard: timedelta,
train: frozenset[int],
val: frozenset[int],
test: frozenset[int],
)
A leak-free temporal-holdout partition — novel satellites scored in novel eras.
The timeline is cut into three guard-separated eras (train = oldest, val = middle, test = newest) and each object is assigned to exactly one partition, contributing only its labels in that partition's era. Object sets are disjoint (no satellite crosses) and the eras are guard-separated (no maneuver window — nor its ±tolerance match envelope — crosses), so the split is leak-free in both the satellite and the time dimension (D7) and byte-stable per seed (D8).
Attributes:
| Name | Type | Description |
|---|---|---|
dataset_version |
str
|
The dataset version the split was computed for (lockstep with the manifest). |
seed |
int
|
The seed the assignment was generated under (orders equal-weight objects). |
cut1 |
datetime
|
The train | val era boundary (the guard band straddles it). |
cut2 |
datetime
|
The val | test era boundary (the guard band straddles it). |
guard |
timedelta
|
Half-width of the dropped band around each cut (≥ the matching tolerance). |
train |
frozenset[int]
|
NORAD ids assigned to train (scored on their pre- |
val |
frozenset[int]
|
NORAD ids assigned to val (scored on their |
test |
frozenset[int]
|
NORAD ids assigned to test (scored on their post- |
era_of
¶
The era index (0/1/2) epoch falls in, or None if it lands in a guard band.
assign
¶
assign(
labels: Sequence[ManeuverLabel],
) -> dict[SplitName, list[ManeuverLabel]]
Group labels by partition, keeping only those in the object's assigned era.
A label is kept when its object is assigned to a partition and the whole label window lies within that partition's era (guard bands excluded). Labels whose object is unassigned, or whose window falls in another era or a guard band, are dropped — they cannot attach to a partition without leaking a satellite or crossing the temporal boundary.
to_json
¶
Serialise to canonical, NORAD-sorted JSON (a stable, committable artifact).
from_json
classmethod
¶
from_json(text: str) -> TemporalSplit
Parse a temporal split from :meth:to_json output.
match_detections
¶
match_detections(
detections: list[Maneuver] | tuple[Maneuver, ...],
labels: list[ScoredLabel] | tuple[ScoredLabel, ...],
) -> Matching
Assign detections to labels one-to-one under the D4 detection-matching rule.
Detections are processed in descending confidence order (ties broken by epoch then NORAD id, so
the pass is deterministic). Each detection claims the nearest still-unclaimed label of the same
object whose [tol_start, tol_end] window contains its epoch (nearest by |Δepoch|, ties
broken toward the earlier label epoch); a detection with no such label is unmatched. The result
is threshold-independent: dropping the lowest-confidence detections never changes the matches of
the ones that remain, so the metric layer can sweep a confidence threshold over a single pass.
Labels whose norad_id is None cannot attach to a scored object and are ignored entirely.
class_metrics
¶
class_metrics(
matching: Matching,
exposure: list[ObjectExposure]
| tuple[ObjectExposure, ...],
*,
operating_point: float = DEFAULT_OPERATING_POINT,
sweep: tuple[float, ...] = DEFAULT_SWEEP,
ci_level: float = DEFAULT_CI_LEVEL,
) -> dict[OrbitClass, ClassMetrics]
Score a :class:~maneuver_detect.benchmark.matching.Matching per orbit class.
exposure is the scored population — every detection and every matchable label must belong to
an object it lists (a :class:ValueError is raised otherwise), since the object fixes both the
orbit class and the satellite-year denominator. ci_level (in (0, 1)) sets the confidence
level of the per-class Wilson intervals on recall and precision. Returns one
:class:ClassMetrics per :class:OrbitClass, present even at zero, so the report shape is
stable regardless of which classes the data covers.
predictions_to_json
¶
predictions_to_json(maneuvers: Sequence[Maneuver]) -> str
Serialise maneuvers to a canonical predictions file (sorted keys, ISO-8601 epochs).
read_predictions
¶
read_predictions(text: str) -> list[Maneuver]
Parse a predictions file (a JSON array of canonical maneuver records) into the schema.
The inverse of :func:predictions_to_json. Each record must carry exactly the canonical
columns (:data:~maneuver_detect.schema.COLUMNS) and nothing else; a null
delta_v_estimate becomes None. The schema is fixed both ways: a record missing a
canonical field or carrying any field beyond them is rejected with :class:ValueError, so a
submission cannot smuggle a query or any other non-prediction payload past the reader (the D12
fixed-schema integrity surface). A non-array payload, or a record that is not a JSON object, is
rejected the same way.
score
¶
score(
predictions: DataFrame | Sequence[Maneuver],
labels: Sequence[ScoredLabel],
exposure: Sequence[ObjectExposure],
*,
operating_point: float = DEFAULT_OPERATING_POINT,
sweep: tuple[float, ...] = DEFAULT_SWEEP,
ci_level: float = DEFAULT_CI_LEVEL,
) -> ScoreReport
Score predictions against held-out labels over the exposure population.
predictions is the canonical maneuver frame (or a sequence of :class:Maneuver); labels
are the held-out labels tagged with their detectability-floor status; exposure is the scored
population (every prediction and label must belong to an object it lists). ci_level (in
(0, 1)) sets the confidence level of the per-class recall / precision intervals. Returns a
deterministic :class:ScoreReport — the same inputs always yield the same numbers (D8).
make_splits
¶
make_splits(
labels: Sequence[ManeuverLabel],
*,
dataset_version: str = DATASET_VERSION,
seed: int = DEFAULT_SEED,
ratios: tuple[float, float, float] = DEFAULT_RATIOS,
stratified: bool = False,
) -> Split
Partition the objects in labels into a leak-free train / val / test :class:Split.
Objects whose maneuver windows overlap are kept together (so no overlapping window crosses a
split), and each object lands wholly in one split (so no satellite crosses). ratios are the
target (train, val, test) label-count fractions; seed orders equal-size components for a
reproducible, byte-stable split (D8). Labels with no norad_id are ignored.
By default the packer balances the total label count across splits. Pass stratified=True
to aim the ratios within each orbit class instead, so per-class val/test shares are
targeted rather than incidental. Both modes hold the leak-free guarantees and are byte-stable
per seed.
make_temporal_split
¶
make_temporal_split(
labels: Sequence[ManeuverLabel],
*,
dataset_version: str = DATASET_VERSION,
seed: int = DEFAULT_SEED,
ratios: tuple[float, float, float] = DEFAULT_RATIOS,
quantiles: tuple[float, float] = DEFAULT_ERA_QUANTILES,
guard: timedelta = DEFAULT_TEMPORAL_GUARD,
) -> TemporalSplit
Build a leak-free temporal-holdout :class:TemporalSplit from labels.
The timeline is cut at the two quantiles of the label epochs into train | val | test eras
(oldest → newest), each cut fenced by a guard band. Every object is assigned to one
partition — among the eras it actually has labels in — greedily toward the per-class ratios
(so each class lands in every partition the catalogue allows), with seed ordering
equal-weight objects for a byte-stable result (D8). An object contributes only its labels in its
partition's era; the rest are dropped to keep both the satellite and the era novel. Labels with
no norad_id are ignored.
split_counts
¶
split_counts(
split: Split, labels: Sequence[ManeuverLabel]
) -> SplitCounts
Count objects and maneuver events per split and orbit class for labels under split.
An object is counted in a class once (its orbit class); events are the per-object label counts.
Every split and :class:OrbitClass appears in the report even at zero count.
Calibration¶
calibration
¶
Uncertainty calibration — make the confidence column mean what it says.
A detector emits a per-detection confidence in [0, 1]; calibration makes that number match
the empirical hit-rate, so that among detections at confidence ~p a fraction ~p are true
positives. This module is the model-agnostic machinery for that:
- Reliability diagnostics — :func:
reliability_curve(binned predicted-vs-empirical), :func:expected_calibration_error, and :func:brier_score. - A post-hoc calibrator — :class:
TemperatureScaling, a one-parameter map fit on held-out data that rescales the confidence so it is reliable. - A conformal predictor — :class:
ConformalPredictor, split-conformal maneuver/false-alarm prediction sets with a marginal coverage guarantee. - A wrapper — :class:
CalibratedDetector, which applies a fitted calibrator to any detector's confidence output (the classical reference included, which carries no checkpoint).
Everything is fit on the val split only — never the test labels — so the reported reliability is
a genuine held-out estimate. The (confidence, outcome) pairs a calibrator is fit on are produced by
:func:maneuver_detect.models.evaluate.calibration_samples_on_val, which runs the same benchmark
matching the scorer uses.
CalibrationSamples
dataclass
¶
The (confidence, outcome) pairs a calibrator is fit / measured on for one population.
confidences are the detector's emitted [0, 1] confidences and outcomes the matched
benchmark verdict per detection — 1.0 for an above-floor true positive, 0.0 for a false
alarm (below-floor matches are excluded, mirroring the benchmark's precision). Produced on the
val split by :func:maneuver_detect.models.evaluate.calibration_samples_on_val.
ReliabilityBin
dataclass
¶
ReliabilityBin(
lo: float,
hi: float,
count: int,
mean_confidence: float | None,
empirical_precision: float | None,
)
One confidence bin of a reliability diagram — predicted vs. empirical for its detections.
ReliabilityCurve
dataclass
¶
ReliabilityCurve(bins: tuple[ReliabilityBin, ...])
The binned reliability diagram — predicted confidence vs. empirical precision per bin.
populated
¶
populated() -> tuple[ReliabilityBin, ...]
The bins that hold at least one detection (the points a diagram actually plots).
Calibrator
¶
Bases: Protocol
A fitted post-hoc map from raw to calibrated confidence (applied by a wrapper detector).
transform
¶
Map raw [0, 1] confidences to calibrated [0, 1] confidences.
TemperatureScaling
dataclass
¶
Post-hoc temperature scaling: calibrated = sigmoid(logit(confidence) / T).
A single positive scalar T fit on held-out (val) data by minimising the binary
cross-entropy of the rescaled confidences against the outcomes. T > 1 softens an
over-confident detector toward the base rate, T < 1 sharpens an under-confident one, and
T == 1 is the identity. The cross-entropy is convex in w = 1/T, so a few Newton steps
converge; T is clamped to t_bounds so a near-separable val sample cannot send it to 0 or
infinity. Fit on the val split only — never the test labels.
fit
classmethod
¶
fit(
confidences: ArrayLike,
outcomes: ArrayLike,
*,
max_iter: int = 100,
tol: float = 1e-09,
t_bounds: tuple[float, float] = (0.05, 20.0),
) -> TemperatureScaling
Fit the temperature on (confidences, outcomes) (raises on an empty sample).
transform
¶
Rescale confidences through the fitted temperature, staying in [0, 1].
ConformalPredictor
dataclass
¶
Split-conformal maneuver/false-alarm prediction sets with marginal coverage >= 1 - alpha.
Calibrated on held-out (val) outcomes by the LAC rule: a detection's non-conformity score is
1 - p(true label) with p(MANEUVER) = confidence, and q is the
ceil((n + 1)(1 - alpha)) / n empirical quantile of the val scores. The prediction set for a
new confidence is {label : p(label) >= 1 - q} — a subset of {MANEUVER, FALSE_ALARM} that
contains the truth with probability at least 1 - alpha under exchangeability. Where the
quantile rank exceeds the sample, q saturates to 1 and the set always covers (it returns
both labels). Fit on the val split only.
fit
classmethod
¶
fit(
confidences: ArrayLike,
outcomes: ArrayLike,
*,
alpha: float = 0.1,
) -> ConformalPredictor
Fit the conformal quantile at error level alpha (raises on empty / out-of-range).
predict_set
¶
The conformal prediction set for a single confidence (a subset of the two labels).
covers
¶
Whether the prediction set for confidence contains the true outcome's label.
CalibratedDetector
¶
CalibratedDetector(inner: Detector, calibrator: Calibrator)
Bases: Detector
Wrap a detector so its emitted confidence is passed through a fitted :class:Calibrator.
Model-agnostic: the inner detector localises and inverts as usual, then every detection's
confidence is mapped through calibrator (clamped to [0, 1]) before the canonical
frame is returned. This is how the classical reference — which carries no checkpoint to freeze a
calibrator into — gets calibrated too. The schema, dtypes, and row order are preserved.
detect
¶
Run the inner detector and return its frame with calibrated confidence.
BundledCalibration
dataclass
¶
BundledCalibration(
temperature: float,
conformal_q: float,
conformal_alpha: float,
reliability: dict[str, ReliabilityCurve],
ece: dict[str, float],
)
The fitted calibration baked into a published detector bundle — val-fit, shipped (D17).
Everything a published detector needs to emit calibrated confidence with no calibration data at inference, plus what its model card and the benchmark docs render:
temperature— the post-hoc :class:TemperatureScalingthe detector applies to its emitted confidence (a single pooled scalar fit across classes).conformal_q/conformal_alpha— the split-conformal predictor, for prediction-set reporting (a prediction set is not a scalar, so it rides alongside the emitted confidence).reliability— the per-orbit-class reliability curve of the calibrated confidence (the data a per-class reliability diagram plots), keyed by orbit-class value.ece— the per-orbit-class expected calibration error of the calibrated confidence, a scalar calibration-quality summary the card reports.
Everything is fit on the val split only (never the test labels). Stored in a bundle's
calibration slot and round-tripped as a plain dict, so an old bundle without one loads as
None and behaves exactly as before.
temperature_scaling
¶
temperature_scaling() -> TemperatureScaling
The fitted post-hoc calibrator the published detector applies to its confidence.
conformal_predictor
¶
conformal_predictor() -> ConformalPredictor
The fitted split-conformal predictor, for prediction-set / coverage reporting.
fit
classmethod
¶
fit(
samples: Mapping[str, CalibrationSamples],
*,
alpha: float = 0.1,
n_bins: int = 10,
) -> BundledCalibration
Fit the bundled calibration from per-orbit-class val (confidence, outcome) samples.
Pools every class's samples to fit the single temperature and the conformal predictor (the
per-detector calibrator), then measures the per-class reliability and ECE on the
calibrated confidences — the curve the published detector's emitted confidence actually
follows. Raises :class:ValueError when no class carries a matched detection to fit on.
Do no harm: the fitted temperature is kept only when it actually reduces the pooled
val ECE; otherwise it falls back to identity (T = 1, raw confidence). On a sparse or
poorly-separated val split the BCE-optimal temperature can collapse toward the clamp bound
and merely flatten the confidence toward the base rate — which does not calibrate — so a
detector that cannot be meaningfully calibrated ships its raw confidence rather than a
confidence-distorting transform.
to_dict
¶
Serialise to a plain dict for the bundle's :func:torch.save payload.
from_dict
classmethod
¶
from_dict(data: Mapping[str, Any]) -> BundledCalibration
Reconstruct from :meth:to_dict (the inverse used by the bundle loaders).
reliability_curve
¶
reliability_curve(
confidences: ArrayLike,
outcomes: ArrayLike,
*,
n_bins: int = 10,
) -> ReliabilityCurve
Bin the detections by confidence and report predicted vs. empirical precision per bin.
Splits [0, 1] into n_bins equal-width bins; each :class:ReliabilityBin carries its
detection count, mean predicted confidence, and empirical precision (the true-positive share).
A perfectly calibrated detector has mean_confidence == empirical_precision in every bin.
expected_calibration_error
¶
expected_calibration_error(
confidences: ArrayLike,
outcomes: ArrayLike,
*,
n_bins: int = 10,
) -> float
The count-weighted mean gap between predicted confidence and empirical precision (ECE).
0.0 for a perfectly calibrated detector; larger means the stated confidence drifts further
from the realised hit-rate. An empty sample scores 0.0.
brier_score
¶
Mean squared error between confidence and outcome — a strictly proper calibration score.
Lower is better; an empty sample scores 0.0.
apply_calibration
¶
apply_calibration(
frame: DataFrame, calibrator: Calibrator
) -> DataFrame
Return frame with its confidence column mapped through calibrator (clamped).
The single place a fitted calibrator is applied to a detector's canonical maneuver frame: an
empty frame passes through untouched, otherwise the confidence column is remapped (clamped
to [0, 1]) and every other column — schema, dtypes, row order — is preserved. Shared by
:class:CalibratedDetector and the published detectors that carry a baked-in calibrator, so
inference applies calibration identically however the calibrator was supplied.
format_reliability_curve
¶
format_reliability_curve(curve: ReliabilityCurve) -> str
Render a reliability curve as a committed-data-free text diagram (markdown table).
The textual form of the per-class reliability diagram the model cards and benchmark docs
publish: one row per populated confidence bin with its detection count, mean predicted
confidence, and empirical precision — the predicted vs. empirical columns a diagram
plots against the diagonal. Deterministic and dependency-free (no plotting backend), so it
renders the same from a bundle's :class:BundledCalibration on any platform; an empty /
unpopulated curve renders a single note (a sparse orbit class with no val detections).