Skip to content

Live Observe Events Schema v1

TopoExec live observe events are a local development and CI evidence stream for topoexec graph observe. They are intentionally separate from metrics schema v1, trace schema v1, and graph schema v1.

observe_schema_version = "1"

Live observe output is allowed to reference metric and trace concepts, but it must not change existing metric, trace, or graph contracts.

Design boundaries

  • Runtime hot paths emit fixed-size numeric records only when live observe is enabled.
  • Runtime hot paths must not serialize JSON, write files or sockets, wait for a UI, allocate event strings, look up shared symbol maps, or copy payload bodies.
  • Collectors and tooling own JSON, NDJSON, record artifacts, assertions, and UI frames.
  • Observer drops and collector failures are observable diagnostics; they do not change runtime scheduling, channel delivery, trigger readiness, CompositeLoop convergence, or runtime ok/fail semantics.
  • Assertion failures may make graph observe exit non-zero, but assertion evaluation remains outside scheduler/channel/trigger internals.

Observe levels

Level Contract
off No live observe records are emitted and hot-path timestamp work is skipped. This is the default for graph run, graph metrics, and graph trace.
summary Default graph observe level. Low-frequency and exceptional events are exact; high-frequency normal events are aggregated, coalesced, or sampled.
detailed Component begin/end, exceptional channel/async/loop events, and selected runtime lifecycle events are emitted as exact records where practical. High-frequency publish/trigger streams may still be sampled or filtered.
debug Explicit intrusive mode for selected components/channels/events. Bounded payload metadata preview may be enabled; payload bodies are still not streamed by default.

Exactness values

Every event, batch, or summary must declare one of these values:

Value Meaning
exact The event represents one runtime fact and no known events of that kind were dropped for the stream/window.
aggregated The event summarizes multiple runtime facts over a bounded window.
sampled The event is a sample of a larger stream selected by configured sampling.
lossy The stream/window has known observer drops or coalescing that prevents full reconstruction.
partial The event is intentionally incomplete, for example because the observer was attached after run start or a collector input ended early.

Dashboards and replay tools must display exactness and observer drop state.

Internal numeric event record

The runtime-internal record is fixed-size and uses numeric IDs. Public embedder APIs should treat this as a preview/internal surface until the live observe contract is promoted.

namespace topoexec::runtime_observe {

enum class LiveEventKind : std::uint16_t {
  kRunStarted,
  kRunFinished,
  kSchedulerEpochBegin,
  kSchedulerEpochEnd,
  kComponentBegin,
  kComponentEnd,
  kComponentError,
  kChannelPublishSummary,
  kChannelCommitSummary,
  kChannelDrop,
  kChannelReject,
  kChannelOverwrite,
  kTriggerReadySummary,
  kTriggerSuppressedSummary,
  kAsyncAdmission,
  kAsyncReject,
  kAsyncDrop,
  kLoopIterationBegin,
  kLoopIterationEnd,
  kLoopConverged,
  kLoopBudgetOverrun,
  kLoopMaxIterationsHit,
  kLoopError,
  kHealthEvent,
  kRuntimeError,
  kObserverDropSummary,
};

struct alignas(64) LiveEvent {
  std::uint64_t local_seq;
  std::uint64_t mono_ns;
  std::uint32_t stream_id;
  std::uint32_t epoch_id;
  std::uint32_t kind;
  std::uint32_t flags;
  std::uint32_t lane_id;
  std::uint32_t worker_id;
  std::uint32_t component_id;
  std::uint32_t channel_id;
  std::uint32_t loop_id;
  std::uint32_t policy_id;
  std::uint32_t reason_id;
  std::uint32_t reserved0;
  std::uint64_t value0;
  std::uint64_t value1;
  std::uint64_t value2;
  std::uint64_t value3;
};

}  // namespace topoexec::runtime_observe

Constraints for this record:

  • no std::string, std::map, owning std::vector, payload owner, or heap-owned event body;
  • no JSON, logging, socket, file, or UI call from the producer path;
  • no blocking queue push or contended mutex wait in the producer path;
  • stream-local local_seq is preferred over a global event sequence.

Symbol table event

Collectors translate numeric IDs through a symbol table emitted once per run.

{
  "observe_schema_version": "1",
  "kind": "symbol_table",
  "run_id": "run-20260507-001",
  "components": {"17": "source"},
  "channels": {"42": "source_transform"},
  "lanes": {"0": "main"},
  "loops": {"3": "solver_loop"},
  "policies": {"8": "rate_limit"},
  "reasons": {"5": "min_interval_ms"}
}

NDJSON envelope

Each NDJSON line is one object. The collector assigns display_seq; producers own only stream_id and local_seq.

{
  "observe_schema_version": "1",
  "run_id": "run-20260507-001",
  "display_seq": 1024,
  "stream_id": "lane:main",
  "local_seq": 88,
  "kind": "component_end",
  "severity": "info",
  "exactness": "exact",
  "mono_ns": 123456789,
  "epoch_id": "7",
  "lane": "main",
  "worker_id": "",
  "component_id": "transform",
  "channel_id": "",
  "loop_id": "",
  "trace_id": "trace-...",
  "transaction_id": "source_transform#7",
  "correlation_id": "source_transform#7",
  "causation_id": "source_transform#7",
  "attributes": {
    "duration_ns": 9200,
    "status": "ok"
  }
}

Required top-level fields for normal events are observe_schema_version, run_id, display_seq, stream_id, local_seq, kind, severity, exactness, and mono_ns. Optional identity fields should use empty strings when absent so line-oriented tools can rely on stable keys.

Event class exactness by level

Event class Summary Detailed Debug
run_started, run_finished exact exact exact
runtime_error, component_error exact exact exact
component_begin, component_end aggregated latency plus latest exact exact
scheduler_epoch_begin, scheduler_epoch_end aggregated exact or partial exact or partial
channel_publish, channel_commit aggregated sampled or filtered exact for selected channels
channel_drop, channel_reject, channel_overwrite exact exact exact
trigger_ready, trigger_suppressed aggregated sampled or filtered exact for selected components
async_admission aggregated exact exact
async_reject, async_drop exact exact exact
loop_iteration aggregated plus latest residual exact exact
loop_converged, loop_error, loop_budget_overrun, loop_max_iterations_hit exact exact exact
health_event exact within bounded retention exact exact
observer_drop_summary exact exact exact
assertion_fail exact exact exact
assertion_pass, assertion_pending aggregated exact exact

UI frame event

SSE dashboard streams should batch collector output instead of pushing every hot event as a separate browser message.

{
  "observe_schema_version": "1",
  "kind": "ui_frame",
  "run_id": "run-20260507-001",
  "frame_seq": 120,
  "window_ms": 50,
  "event_count": 1400,
  "dropped_event_count_delta": 0,
  "exactness": "aggregated",
  "events": [
    {"kind": "component_error", "component_id": "filter", "severity": "error"}
  ],
  "summaries": {
    "components": {
      "filter": {
        "execution_count_delta": 10,
        "last_duration_ns": 9000,
        "max_duration_ns": 12000
      }
    },
    "channels": {
      "source_filter": {
        "publish_count_delta": 1000,
        "delivery_count_delta": 998,
        "drop_count_delta": 0,
        "max_depth": 4
      }
    }
  }
}

Default UI frame cadence is approximately 50 ms. Raw browser-side event history must be bounded by default.

Observer drop summary

Overflow is reported without blocking producers.

{
  "observe_schema_version": "1",
  "kind": "observer_drop_summary",
  "run_id": "run-20260507-001",
  "stream_id": "lane:main",
  "local_seq": 129,
  "display_seq": 2048,
  "severity": "warning",
  "exactness": "lossy",
  "dropped_event_count": 1842,
  "window_start_mono_ns": 120000000,
  "window_end_mono_ns": 170000000,
  "affected_kind": "channel_publish"
}

Assertion events

Live assertions use their own schema version and are emitted by collector/tooling only.

{"observe_schema_version":"1","kind":"assertion_registered","assertion_schema_version":"1","assertion_id":"no_channel_drops","exactness":"exact"}
{"observe_schema_version":"1","kind":"assertion_pass","assertion_schema_version":"1","assertion_id":"no_runtime_errors","exactness":"exact"}
{"observe_schema_version":"1","kind":"assertion_fail","assertion_schema_version":"1","assertion_id":"sink_receives_by_epoch_3","reason":"eventually condition not satisfied","exactness":"exact"}
{"observe_schema_version":"1","kind":"assertion_pending","assertion_schema_version":"1","assertion_id":"loop_converges","exactness":"exact"}

Payload preview metadata

Default observe output may include payload metadata only:

  • payload_type_id
  • payload_size_bytes
  • payload_sequence
  • payload_timestamp
  • optional bounded hash

Payload body preview requires debug, explicit component/channel filters, and a hard byte limit. Preview must not extend payload ownership or require deep copies.