Concepts
Concepts
apiexec solves one problem: every team building on top of REST or AI APIs re-implements the same fragile retry, pagination, and chunking logic. apiexec centralises that once, correctly, behind a clean Stream<T> interface.
This page explains how the engine works, what each capability is solving, and why the library is designed the way it is. Read this before diving into the Stream API reference or Adapters page.
The Core Idea
Think of apiexec as a query planner for APIs. Callers declare what they want - which adapter, which endpoint, how much budget - and the engine decides how to fetch it: how large each request should be, when to retry, when to back off, when to stop.
Your Code │ next_batch() ← you only call this ▼ ExecutionEngine<T> ← fetch loop lives here │ ├── VendorAdapter<T> ← adapter knows the API shape ├── ExecutionPolicy ← policy owns retry / window / budget └── ITransport ← transport owns HTTP (mockable)
Each of those three slots is a swappable strategy. The engine owns the loop; the strategies own the details. This is how apiexec can serve a cursor-paginated REST API, a time-windowed metrics API, and an AI completions API with the same engine core.
The Fetch Loop
The engine's inner loop is an invariant eight-step sequence. Every adapter runs through the same steps; only what happens at each step differs.
fetch_one()
│
├─ 1. check_preconditions
│ exhausted? cancelled? budget exceeded? → terminal
│
├─ 2. build_request ← VendorAdapter
│ cursor → HTTP request (URL, headers, body)
│
├─ 3. execute_transport ← ITransport
│ HTTP request → HTTP response
│
├─ 4. handle_error ← Policy + Adapter
│ 200 → proceed
│ 429 → backoff + retry
│ 5xx → backoff + retry
│ 4xx (non-429) → terminal
│ parse fail → terminal
│
├─ 5. parse_response ← VendorAdapter
│ response body → vector<T>
│
├─ 6. advance_cursor ← VendorAdapter
│ cursor ← next_cursor(current, response)
│
├─ 7. adjust_policy ← ExecutionPolicy
│ update window size / record cost
│
└─ 8. yield
FetchResult<T> → caller
The sequence is fixed. What makes adapters possible is that steps 2, 5, and 6 are pure delegates - the engine calls a virtual method and returns what the adapter gives back. The engine itself never parses JSON or constructs URLs.
Reliability: Transparent Retry
The engine classifies every error response into one of two buckets before deciding what to do:
| Condition | Action | Reason |
|---|---|---|
| 429 Rate Limited | Retry with backoff + jitter | Load signal - back off and try again |
| 5xx Server Error | Retry with backoff | Transient server-side failure |
| Network / Timeout | Retry with backoff | Transient connectivity failure |
| 4xx (non-429) | Fail immediately | Auth or validation failure - retrying won't help |
| Parse Error | Fail immediately | Data format mismatch - not a load signal |
Retry-After is honoured. When a 429 includes a Retry-After header, the engine waits exactly that long (clamped to one hour). If absent, it uses exponential backoff with ±25% jitter to avoid thundering herd.
Retry counters reset on success. If a stream receives 3 transient 429s during a long run but recovers each time, the retry counter resets to zero after each successful fetch. A brief 429 storm does not permanently exhaust the retry budget.
Callers never see retries. next_batch() either returns records or a terminal error. The retry machinery is entirely internal.
Efficiency: Double-Buffer Prefetch
By default, the engine runs requests sequentially: fetch page 1, deliver it, fetch page 2, deliver it. During the caller's processing time, the network is idle.
With prefetch_depth ≥ 1, the engine overlaps network time with processing time:
Sequential: Fetch 1 │ Process 1 │ Fetch 2 │ Process 2 │ Fetch 3 …
Prefetch depth=1: Fetch 1 │ Fetch 2 │ │ │
│ Process 1 │ Fetch 3 │ │
Process 2 …After yielding a batch, the engine immediately fires the next fetch on a background thread. The next next_batch() call collects the already-in-flight result rather than starting a new request from scratch.
Measured throughput improvement: ~47% on a benchmark of 50 pages × 100 records with 10 ms server latency and 10 ms processing time (sequential: 1 083 ms, prefetch: 575 ms).
Memory is bounded. Two pre-allocated batch buffers (current + prefetch) plus a 16 MB HTTP response cap. There is no unbounded queue, no growing buffer.
The prefetch depth is controlled by ExecutionPolicy::prefetch_depth(). Setting it to 0 disables prefetch entirely - useful when processing is slower than the network, or when debugging.
Adaptive Rate Management
When a paginated or time-windowed API rate-limits you, the typical response is to request smaller chunks of data per call. apiexec does this automatically by adjusting the time window embedded in the cursor.
The algorithm is deliberately conservative:
| Event | Window adjustment |
|---|---|
| First 429 in a retry cycle | Shrink by window_shrink_factor (default ×0.5) |
| Successful fetch (normal) | Grow by window_grow_factor (default ×1.5), up to max_window_ms |
| Successful fetch after a recovered 429 | No growth - the pressure was real |
| 4xx or 5xx | No change - not a load signal |
The "no growth after recovery" rule is important. Without it, the engine would immediately re-inflate the window on the first success after a 429, re-trigger the rate limit, and oscillate. The rule lets the engine converge to a sustainable window size.
This is most visible with the datadog_metrics adapter, which starts with a 1-hour window and automatically subdivides it when the Datadog API pushes back - without the caller doing any tuning.
Cost Enforcement
AI APIs bill by token. The engine has built-in hooks for tracking and halting on cost, available to any adapter:
// VendorAdapter interface - optional, defaults to nullopt virtual std::optional<double> response_cost(const Response& resp) const; // ExecutionPolicy interface - optional, defaults to no-op / false virtual void record_cost(const Cursor& cursor, double cost_units); virtual std::optional<double> remaining_budget() const; virtual bool budget_exceeded() const;
CostAwarePolicy is the concrete implementation. It tracks tokens × price_per_token, warns at 80% of budget, and halts the engine before the next fetch when the budget cap is reached.
Example: A stream with a 1 000-token budget against an API returning 100 tokens per response halts after exactly 10 successful fetches with BUDGET_EXHAUSTED. The caller sees a terminal error - no partial fetch, no overspend.
The openai and anthropic adapters report token usage via response_cost() by reading the usage.total_tokens field from the API response.
Observability
Metrics
The engine maintains 11 atomic counters and gauges, readable at any time via metrics_snapshot():
| Metric | Type | What it counts |
|---|---|---|
request_count | Counter | Total HTTP requests sent |
retry_count | Counter | Total retries (429 + 5xx + network) |
success_count | Counter | Requests that returned 200 |
error_rate_limit | Counter | Terminal 429 errors |
error_server | Counter | Terminal 5xx errors |
error_client | Counter | Terminal 4xx (non-429) errors |
error_network | Counter | Terminal network/timeout errors |
error_parse | Counter | Terminal parse failures |
records_total | Gauge | Cumulative records delivered to caller |
window_size_ms | Gauge | Current time window (time-windowed adapters) |
cumulative_cost | Gauge | Cumulative cost units consumed |
Metrics export to Prometheus text format via metrics.to_prometheus(prefix) - ready to scrape from any /metrics endpoint with no extra dependencies.
Structured Logging
Every error path emits a LogEntry with:
- Log level (DEBUG, INFO, WARN, ERROR)
- Human-readable message
- HTTP status code
- Retry count at the time of the event
- Redacted cursor state
API keys and secrets never appear in log output. The auth_header config field is used to build requests but is never written to a log entry.
Callback-Based Integration
Both metrics and logging are exposed as callbacks set on the engine:
engine.set_metrics_callback([](const MetricsSnapshot& snap) {
statsd_gauge("stream.records", snap.records_total);
});
engine.set_log_callback([](const LogEntry& e) {
spdlog::log(to_spdlog_level(e.level), e.message);
});The callback pattern lets applications plug in their existing Prometheus, StatsD, OpenTelemetry, or structured logging stack without apiexec embedding any of them.
Streaming Model
The engine supports two iteration modes:
Batch Mode
next_batch() returns a FetchResult<T> containing a vector of records. This is the standard mode for paginated REST APIs where the server delivers a full page per response.
while (engine.has_next()) {
auto result = engine.next_batch();
if (result.error != StreamErrorCode::OK) break;
for (auto& record : result.records) {
process(record);
}
}Record-Level Mode
stream(callback) invokes a callback for each individual record. Intended for SSE or AI streaming APIs where responses arrive as a stream of tokens or events rather than a complete page.
engine.stream([](const T& record) {
handle_record(record);
return true; // return false to stop early
});Adapters opt in to incremental delivery by overriding parse_streaming() and supports_streaming(). Default implementations bundle the full response body into a single batch, so existing adapters remain compatible without changes.
Use Cases
1. Paginated REST Ingestion
Problem: Fetch all records from a REST API that intermittently rate-limits, uses opaque cursor tokens, and occasionally drops connections.
What apiexec provides: Configure generic_rest with base_url, data_field, and next_token_field. Write a while(has_next()) loop. The engine handles retries, prefetch, cursor advancement, and termination detection. Zero retry or pagination logic in the caller.
2. Time-Windowed Metrics Backfill
Problem: Backfill 30 days of Datadog metrics. The API rate-limits aggressively on large time windows.
What apiexec provides: The datadog_metrics adapter uses a time-window cursor starting at a configured window_ms. On each 429, the engine halves the window. On success, it grows back - but not immediately after a recovery. No hand-tuned chunk sizing needed.
3. AI Completion With Cost Budget
Problem: Summarise a 100k-token document by chunking it into 4k-token prompts through GPT-4 or Claude, with a hard $5 budget limit.
What apiexec provides: The openai or anthropic adapter reports token usage via response_cost(). CostAwarePolicy tracks cumulative spend and halts cleanly with BUDGET_EXHAUSTED before the budget is exceeded. The adapter's prompt_chunk_for_cursor() splits the document at sentence boundaries.
4. Cross-Language Orchestration
Problem: A Go service orchestrates multiple data pipelines, each hitting a different vendor API. One engine, five language callers.
What apiexec provides: The ABI-stable C API is wrapped by Go, Rust, Python, Java, and JavaScript bindings. The same stream_create(adapter_name, config_json, policy_json) call dispatches through the adapter registry at runtime - no recompilation when switching adapters.
5. Embedded in a C++ Host Process
Problem: A C++ pipeline needs reliable API ingestion without pulling in a heavyweight framework.
What apiexec provides: apiexec_core is a header-only CMake INTERFACE target - zero compiled code in the core layer. Link libapiexec_capi.a or libapiexec.so directly. The entire library - engine, transport, policy, adapters - compiles to a few hundred KB with only libcurl and nlohmann/json as runtime dependencies.
Non-Goals
These are explicitly outside apiexec's scope by design:
| Out of scope | Why |
|---|---|
| API gateway routing | No URL rewriting, no auth proxying - apiexec is a client-side library |
| Agent / workflow orchestration | No task graphs - stream ordering is the engine's concern, sequencing is the caller's |
| Data storage | Streams yield records; where they go is the caller's concern |
| Full vendor SDK replacement | Adapters are intentionally thin - use vendor SDKs when you need their full API surface |
Why This Architecture
Strategy + Template Method together
The fetch sequence - check preconditions → build request → execute → handle error → parse → advance cursor → adjust policy → yield - is invariant. What changes is what happens at each variable step.
Hardcoding the sequence in the engine and delegating steps to injected strategies gives maximum reuse with minimum coupling. Adding a new adapter does not touch the engine. Swapping the policy does not touch the adapter.
Layered modules enforced at build time
The core/ layer has zero external dependencies - verified by linker output. Unit tests that use only mock transports link zero libraries beyond libc. This matters because it keeps the interface layer clean, makes the test surface small, and prevents accidentally embedding a libcurl dependency in code that is supposed to be portable.
The layer boundary is: core/ ← transport/, core/ ← policy/, core/ ← adapters/. Nothing in core/ points outward.
ABI-stable C API
Language bindings are first-class. Five bindings share one compiled implementation. If the C++ internals change - a struct gains a field, a method is renamed - the bindings do not break as long as the C API surface is unchanged. Future API additions use versioned _v{N} symbols so old binaries continue to run against a newer shared library.
Callback-based metrics and logging
Every application already has an observability stack. Embedding a Prometheus client would force a specific choice on every user. The callback pattern lets users wire in their existing counters, histograms, and log sinks with two lines of code.
Cost hooks in the core interfaces
Retrofitting cost enforcement later would have required a breaking change to the ExecutionPolicy pure-virtual interface. Adding the hooks as defaulted (non-pure) virtual methods costs nothing for existing adapters and policies - they inherit return nullopt / return false - while enabling full AI budget control for new ones.
