Skip to content

Defensive Input Handling

TopoExec validates graph inputs before runtime execution and keeps hostile or accidental large inputs bounded. This page records the current defensive limits and the behavior expected by tests.

Parser limits

The YAML loader rejects inputs before allocation-heavy runtime work when these limits are exceeded:

Limit Value Applies to
Graph text size 1 MiB load_graph_text() and files read through load_graph_file()
Lanes 256 lanes mapping entries
Components 4096 components sequence entries
Edges 8192 edges sequence entries
CompositeLoops 1024 composite_loops sequence entries
Identifier length 128 bytes graph name, lane ids, component ids, edge ids, loop ids
Config nesting depth 8 graph/component config maps and sequences
Config scalar/serialized value 4096 bytes config scalar values and serialized nested values
Non-config string length 4096 bytes component types, endpoints, policy names, descriptors, and other YAML strings

Limit errors are explicit std::invalid_argument messages that name the graph path and limit, for example runtime graph.components count exceeds limit 4096. Inputs must be valid UTF-8 text; invalid byte sequences fail before YAML parsing with a bounded diagnostic instead of being treated as opaque binary.

The public GraphInputLimits struct exposes these defaults through default_graph_input_limits(). Embedders can call load_graph_text(text, limits) or load_graph_file(path, limits) to tighten limits for tests, editors, or CI. File loading reads incrementally and fails once the configured byte limit would be exceeded, so oversized files do not need to be materialized fully before rejection.

CLI overrides

Graph-reading CLI commands accept the same local parser-limit overrides:

./build/topoexec graph validate examples/minimal.yaml \
  --max-graph-input-bytes 65536 \
  --max-components 512 \
  --max-edges 1024 \
  --format json

Overrides are intentionally per-invocation; TopoExec does not persist them in project state. The defaults remain safe for normal examples, while CI/editor jobs can lower them to protect interactive workflows. topoexec doctor --format json reports the default graph_input_limits contract.

Numeric validation

The parser uses bounded integer conversions and semantic validation rejects negative capacities, durations, trigger windows, and loop budgets where they are not meaningful. Capacity and max_inflight errors are reported during graph validation before runtime execution.

Fuzz and malformed input

fuzz_graph_input_smoke runs a deterministic corpus of malformed, cyclic, partial, nested, invalid-UTF-8, oversized, and mutated YAML inputs. The optional fuzz_graph_inputs target can run the checked-in corpus in standalone mode or as a libFuzzer target. Both fail on timeouts, crash-like exit codes, or sanitizer/crash markers. Run the local gate with ./scripts/goal_check.sh fuzz; see Coverage-guided fuzzing for libFuzzer and regression-corpus commands.

Trust boundary

TopoExec YAML declares graph structure. It is not an untrusted code execution engine: the loader parses data, validates semantics, and returns a GraphSpec; component instantiation still comes from an embedder-provided registry. Dynamic plugin loading remains a separate default-off trusted-native preview; plugin discovery, shared-library paths, signing/allowlists, sandboxing, and ABI compatibility stay outside graph YAML and must remain explicit before any broader plugin ecosystem can be enabled by default.

CLI output paths

Current CLI commands write to stdout/stderr only. Users can redirect output with their shell, but the CLI does not accept arbitrary output-file paths today. If a future command writes files, it must validate path traversal and overwrite policy before writing.

Blocking overflow policy

overflow: block is treated as a bounded policy, not as permission for hidden unbounded recursion. Low-level channel calls return would block producer rather than blocking forever in the single-thread path. Runtime async admission treats block like a rejection when max_inflight is exhausted, so rejected completions are not committed. Graph authors should prefer explicit drop_*, reject, or fail_fast unless a future scheduler lane documents safe blocking behavior.

Error-message hygiene

Diagnostics should report graph paths, ids, policy names, and limits. They should not dump environment variables, credentials, or unrelated filesystem state. topoexec doctor --format json reports only tool/schema/example/benchmark locations needed for local troubleshooting.