Live Observe Events Schema v1¶
TopoExec live observe events are a local development and CI evidence stream for
topoexec graph observe. They are intentionally separate from metrics schema v1,
trace schema v1, and graph schema v1.
observe_schema_version = "1"
Live observe output is allowed to reference metric and trace concepts, but it must not change existing metric, trace, or graph contracts.
Design boundaries¶
- Runtime hot paths emit fixed-size numeric records only when live observe is enabled.
- Runtime hot paths must not serialize JSON, write files or sockets, wait for a UI, allocate event strings, look up shared symbol maps, or copy payload bodies.
- Collectors and tooling own JSON, NDJSON, record artifacts, assertions, and UI frames.
- Observer drops and collector failures are observable diagnostics; they do not change runtime scheduling, channel delivery, trigger readiness, CompositeLoop convergence, or runtime ok/fail semantics.
- Assertion failures may make
graph observeexit non-zero, but assertion evaluation remains outside scheduler/channel/trigger internals.
Observe levels¶
| Level | Contract |
|---|---|
off |
No live observe records are emitted and hot-path timestamp work is skipped. This is the default for graph run, graph metrics, and graph trace. |
summary |
Default graph observe level. Low-frequency and exceptional events are exact; high-frequency normal events are aggregated, coalesced, or sampled. |
detailed |
Component begin/end, exceptional channel/async/loop events, and selected runtime lifecycle events are emitted as exact records where practical. High-frequency publish/trigger streams may still be sampled or filtered. |
debug |
Explicit intrusive mode for selected components/channels/events. Bounded payload metadata preview may be enabled; payload bodies are still not streamed by default. |
Exactness values¶
Every event, batch, or summary must declare one of these values:
| Value | Meaning |
|---|---|
exact |
The event represents one runtime fact and no known events of that kind were dropped for the stream/window. |
aggregated |
The event summarizes multiple runtime facts over a bounded window. |
sampled |
The event is a sample of a larger stream selected by configured sampling. |
lossy |
The stream/window has known observer drops or coalescing that prevents full reconstruction. |
partial |
The event is intentionally incomplete, for example because the observer was attached after run start or a collector input ended early. |
Dashboards and replay tools must display exactness and observer drop state.
Internal numeric event record¶
The runtime-internal record is fixed-size and uses numeric IDs. Public embedder APIs should treat this as a preview/internal surface until the live observe contract is promoted.
namespace topoexec::runtime_observe {
enum class LiveEventKind : std::uint16_t {
kRunStarted,
kRunFinished,
kSchedulerEpochBegin,
kSchedulerEpochEnd,
kComponentBegin,
kComponentEnd,
kComponentError,
kChannelPublishSummary,
kChannelCommitSummary,
kChannelDrop,
kChannelReject,
kChannelOverwrite,
kTriggerReadySummary,
kTriggerSuppressedSummary,
kAsyncAdmission,
kAsyncReject,
kAsyncDrop,
kLoopIterationBegin,
kLoopIterationEnd,
kLoopConverged,
kLoopBudgetOverrun,
kLoopMaxIterationsHit,
kLoopError,
kHealthEvent,
kRuntimeError,
kObserverDropSummary,
};
struct alignas(64) LiveEvent {
std::uint64_t local_seq;
std::uint64_t mono_ns;
std::uint32_t stream_id;
std::uint32_t epoch_id;
std::uint32_t kind;
std::uint32_t flags;
std::uint32_t lane_id;
std::uint32_t worker_id;
std::uint32_t component_id;
std::uint32_t channel_id;
std::uint32_t loop_id;
std::uint32_t policy_id;
std::uint32_t reason_id;
std::uint32_t reserved0;
std::uint64_t value0;
std::uint64_t value1;
std::uint64_t value2;
std::uint64_t value3;
};
} // namespace topoexec::runtime_observe
Constraints for this record:
- no
std::string,std::map, owningstd::vector, payload owner, or heap-owned event body; - no JSON, logging, socket, file, or UI call from the producer path;
- no blocking queue push or contended mutex wait in the producer path;
- stream-local
local_seqis preferred over a global event sequence.
Symbol table event¶
Collectors translate numeric IDs through a symbol table emitted once per run.
{
"observe_schema_version": "1",
"kind": "symbol_table",
"run_id": "run-20260507-001",
"components": {"17": "source"},
"channels": {"42": "source_transform"},
"lanes": {"0": "main"},
"loops": {"3": "solver_loop"},
"policies": {"8": "rate_limit"},
"reasons": {"5": "min_interval_ms"}
}
NDJSON envelope¶
Each NDJSON line is one object. The collector assigns display_seq; producers
own only stream_id and local_seq.
{
"observe_schema_version": "1",
"run_id": "run-20260507-001",
"display_seq": 1024,
"stream_id": "lane:main",
"local_seq": 88,
"kind": "component_end",
"severity": "info",
"exactness": "exact",
"mono_ns": 123456789,
"epoch_id": "7",
"lane": "main",
"worker_id": "",
"component_id": "transform",
"channel_id": "",
"loop_id": "",
"trace_id": "trace-...",
"transaction_id": "source_transform#7",
"correlation_id": "source_transform#7",
"causation_id": "source_transform#7",
"attributes": {
"duration_ns": 9200,
"status": "ok"
}
}
Required top-level fields for normal events are observe_schema_version,
run_id, display_seq, stream_id, local_seq, kind, severity,
exactness, and mono_ns. Optional identity fields should use empty strings
when absent so line-oriented tools can rely on stable keys.
Event class exactness by level¶
| Event class | Summary | Detailed | Debug |
|---|---|---|---|
run_started, run_finished |
exact | exact | exact |
runtime_error, component_error |
exact | exact | exact |
component_begin, component_end |
aggregated latency plus latest | exact | exact |
scheduler_epoch_begin, scheduler_epoch_end |
aggregated | exact or partial | exact or partial |
channel_publish, channel_commit |
aggregated | sampled or filtered | exact for selected channels |
channel_drop, channel_reject, channel_overwrite |
exact | exact | exact |
trigger_ready, trigger_suppressed |
aggregated | sampled or filtered | exact for selected components |
async_admission |
aggregated | exact | exact |
async_reject, async_drop |
exact | exact | exact |
loop_iteration |
aggregated plus latest residual | exact | exact |
loop_converged, loop_error, loop_budget_overrun, loop_max_iterations_hit |
exact | exact | exact |
health_event |
exact within bounded retention | exact | exact |
observer_drop_summary |
exact | exact | exact |
assertion_fail |
exact | exact | exact |
assertion_pass, assertion_pending |
aggregated | exact | exact |
UI frame event¶
SSE dashboard streams should batch collector output instead of pushing every hot event as a separate browser message.
{
"observe_schema_version": "1",
"kind": "ui_frame",
"run_id": "run-20260507-001",
"frame_seq": 120,
"window_ms": 50,
"event_count": 1400,
"dropped_event_count_delta": 0,
"exactness": "aggregated",
"events": [
{"kind": "component_error", "component_id": "filter", "severity": "error"}
],
"summaries": {
"components": {
"filter": {
"execution_count_delta": 10,
"last_duration_ns": 9000,
"max_duration_ns": 12000
}
},
"channels": {
"source_filter": {
"publish_count_delta": 1000,
"delivery_count_delta": 998,
"drop_count_delta": 0,
"max_depth": 4
}
}
}
}
Default UI frame cadence is approximately 50 ms. Raw browser-side event history must be bounded by default.
Observer drop summary¶
Overflow is reported without blocking producers.
{
"observe_schema_version": "1",
"kind": "observer_drop_summary",
"run_id": "run-20260507-001",
"stream_id": "lane:main",
"local_seq": 129,
"display_seq": 2048,
"severity": "warning",
"exactness": "lossy",
"dropped_event_count": 1842,
"window_start_mono_ns": 120000000,
"window_end_mono_ns": 170000000,
"affected_kind": "channel_publish"
}
Assertion events¶
Live assertions use their own schema version and are emitted by collector/tooling only.
{"observe_schema_version":"1","kind":"assertion_registered","assertion_schema_version":"1","assertion_id":"no_channel_drops","exactness":"exact"}
{"observe_schema_version":"1","kind":"assertion_pass","assertion_schema_version":"1","assertion_id":"no_runtime_errors","exactness":"exact"}
{"observe_schema_version":"1","kind":"assertion_fail","assertion_schema_version":"1","assertion_id":"sink_receives_by_epoch_3","reason":"eventually condition not satisfied","exactness":"exact"}
{"observe_schema_version":"1","kind":"assertion_pending","assertion_schema_version":"1","assertion_id":"loop_converges","exactness":"exact"}
Payload preview metadata¶
Default observe output may include payload metadata only:
payload_type_idpayload_size_bytespayload_sequencepayload_timestamp- optional bounded hash
Payload body preview requires debug, explicit component/channel filters, and a
hard byte limit. Preview must not extend payload ownership or require deep copies.