Concepts

apiexec solves one problem: every team building on top of REST or AI APIs re-implements the same fragile retry, pagination, and chunking logic. apiexec centralises that once, correctly, behind a clean Stream<T> interface.

This page explains how the engine works, what each capability is solving, and why the library is designed the way it is. Read this before diving into the Stream API reference or Adapters page.

The Core Idea

Think of apiexec as a query planner for APIs. Callers declare what they want - which adapter, which endpoint, how much budget - and the engine decides how to fetch it: how large each request should be, when to retry, when to back off, when to stop.

text

Your Code
   │  next_batch()          ← you only call this
   ▼
ExecutionEngine<T>          ← fetch loop lives here
   │
   ├── VendorAdapter<T>     ← adapter knows the API shape
   ├── ExecutionPolicy      ← policy owns retry / window / budget
   └── ITransport           ← transport owns HTTP (mockable)

Each of those three slots is a swappable strategy. The engine owns the loop; the strategies own the details. This is how apiexec can serve a cursor-paginated REST API, a time-windowed metrics API, and an AI completions API with the same engine core.

The Fetch Loop

The engine's inner loop is an invariant eight-step sequence. Every adapter runs through the same steps; only what happens at each step differs.

fetch_one()
│
├─ 1. check_preconditions
│      exhausted? cancelled? budget exceeded? → terminal
│
├─ 2. build_request          ← VendorAdapter
│      cursor → HTTP request (URL, headers, body)
│
├─ 3. execute_transport      ← ITransport
│      HTTP request → HTTP response
│
├─ 4. handle_error           ← Policy + Adapter
│      200 → proceed
│      429 → backoff + retry
│      5xx → backoff + retry
│      4xx (non-429) → terminal
│      parse fail → terminal
│
├─ 5. parse_response         ← VendorAdapter
│      response body → vector<T>
│
├─ 6. advance_cursor         ← VendorAdapter
│      cursor ← next_cursor(current, response)
│
├─ 7. adjust_policy          ← ExecutionPolicy
│      update window size / record cost
│
└─ 8. yield
       FetchResult<T> → caller

fetch_one() - the engine's inner loop

The sequence is fixed. What makes adapters possible is that steps 2, 5, and 6 are pure delegates - the engine calls a virtual method and returns what the adapter gives back. The engine itself never parses JSON or constructs URLs.

Reliability: Transparent Retry

The engine classifies every error response into one of two buckets before deciding what to do:

Condition	Action	Reason
429 Rate Limited	Retry with backoff + jitter	Load signal - back off and try again
5xx Server Error	Retry with backoff	Transient server-side failure
Network / Timeout	Retry with backoff	Transient connectivity failure
4xx (non-429)	Fail immediately	Auth or validation failure - retrying won't help
Parse Error	Fail immediately	Data format mismatch - not a load signal

Retry-After is honoured. When a 429 includes a Retry-After header, the engine waits exactly that long (clamped to one hour). If absent, it uses exponential backoff with ±25% jitter to avoid thundering herd.

Retry counters reset on success. If a stream receives 3 transient 429s during a long run but recovers each time, the retry counter resets to zero after each successful fetch. A brief 429 storm does not permanently exhaust the retry budget.

Callers never see retries. next_batch() either returns records or a terminal error. The retry machinery is entirely internal.

Efficiency: Double-Buffer Prefetch

By default, the engine runs requests sequentially: fetch page 1, deliver it, fetch page 2, deliver it. During the caller's processing time, the network is idle.

With prefetch_depth ≥ 1, the engine overlaps network time with processing time:

text

Sequential:        Fetch 1 │ Process 1 │ Fetch 2 │ Process 2 │ Fetch 3 …
Prefetch depth=1:  Fetch 1 │ Fetch 2   │          │           │
                            │ Process 1 │ Fetch 3  │           │
                                         Process 2  …

After yielding a batch, the engine immediately fires the next fetch on a background thread. The next next_batch() call collects the already-in-flight result rather than starting a new request from scratch.

Measured throughput improvement: ~47% on a benchmark of 50 pages × 100 records with 10 ms server latency and 10 ms processing time (sequential: 1 083 ms, prefetch: 575 ms).

Memory is bounded. Two pre-allocated batch buffers (current + prefetch) plus a 16 MB HTTP response cap. There is no unbounded queue, no growing buffer.

The prefetch depth is controlled by ExecutionPolicy::prefetch_depth(). Setting it to 0 disables prefetch entirely - useful when processing is slower than the network, or when debugging.

Adaptive Rate Management

When a paginated or time-windowed API rate-limits you, the typical response is to request smaller chunks of data per call. apiexec does this automatically by adjusting the time window embedded in the cursor.

The algorithm is deliberately conservative:

Event	Window adjustment
First 429 in a retry cycle	Shrink by `window_shrink_factor` (default ×0.5)
Successful fetch (normal)	Grow by `window_grow_factor` (default ×1.5), up to `max_window_ms`
Successful fetch after a recovered 429	No growth - the pressure was real
4xx or 5xx	No change - not a load signal

The "no growth after recovery" rule is important. Without it, the engine would immediately re-inflate the window on the first success after a 429, re-trigger the rate limit, and oscillate. The rule lets the engine converge to a sustainable window size.

This is most visible with the datadog_metrics adapter, which starts with a 1-hour window and automatically subdivides it when the Datadog API pushes back - without the caller doing any tuning.

Cost Enforcement

AI APIs bill by token. The engine has built-in hooks for tracking and halting on cost, available to any adapter:

cpp

// VendorAdapter interface - optional, defaults to nullopt
virtual std::optional<double> response_cost(const Response& resp) const;

// ExecutionPolicy interface - optional, defaults to no-op / false
virtual void record_cost(const Cursor& cursor, double cost_units);
virtual std::optional<double> remaining_budget() const;
virtual bool budget_exceeded() const;

CostAwarePolicy is the concrete implementation. It tracks tokens × price_per_token, warns at 80% of budget, and halts the engine before the next fetch when the budget cap is reached.

Example: A stream with a 1 000-token budget against an API returning 100 tokens per response halts after exactly 10 successful fetches with BUDGET_EXHAUSTED. The caller sees a terminal error - no partial fetch, no overspend.

The openai and anthropic adapters report token usage via response_cost() by reading the usage.total_tokens field from the API response.

Observability

Metrics

The engine maintains 11 atomic counters and gauges, readable at any time via metrics_snapshot():

Metric	Type	What it counts
`request_count`	Counter	Total HTTP requests sent
`retry_count`	Counter	Total retries (429 + 5xx + network)
`success_count`	Counter	Requests that returned 200
`error_rate_limit`	Counter	Terminal 429 errors
`error_server`	Counter	Terminal 5xx errors
`error_client`	Counter	Terminal 4xx (non-429) errors
`error_network`	Counter	Terminal network/timeout errors
`error_parse`	Counter	Terminal parse failures
`records_total`	Gauge	Cumulative records delivered to caller
`window_size_ms`	Gauge	Current time window (time-windowed adapters)
`cumulative_cost`	Gauge	Cumulative cost units consumed

Metrics export to Prometheus text format via metrics.to_prometheus(prefix) - ready to scrape from any /metrics endpoint with no extra dependencies.

Structured Logging

Every error path emits a LogEntry with:

Log level (DEBUG, INFO, WARN, ERROR)
Human-readable message
HTTP status code
Retry count at the time of the event
Redacted cursor state

API keys and secrets never appear in log output. The auth_header config field is used to build requests but is never written to a log entry.

Callback-Based Integration

Both metrics and logging are exposed as callbacks set on the engine:

cpp

engine.set_metrics_callback([](const MetricsSnapshot& snap) {
    statsd_gauge("stream.records", snap.records_total);
});

engine.set_log_callback([](const LogEntry& e) {
    spdlog::log(to_spdlog_level(e.level), e.message);
});

The callback pattern lets applications plug in their existing Prometheus, StatsD, OpenTelemetry, or structured logging stack without apiexec embedding any of them.

Streaming Model

The engine supports two iteration modes:

Batch Mode

next_batch() returns a FetchResult<T> containing a vector of records. This is the standard mode for paginated REST APIs where the server delivers a full page per response.

cpp

while (engine.has_next()) {
    auto result = engine.next_batch();
    if (result.error != StreamErrorCode::OK) break;
    for (auto& record : result.records) {
        process(record);
    }
}

Record-Level Mode

stream(callback) invokes a callback for each individual record. Intended for SSE or AI streaming APIs where responses arrive as a stream of tokens or events rather than a complete page.

cpp

engine.stream([](const T& record) {
    handle_record(record);
    return true; // return false to stop early
});

Adapters opt in to incremental delivery by overriding parse_streaming() and supports_streaming(). Default implementations bundle the full response body into a single batch, so existing adapters remain compatible without changes.

Use Cases

1. Paginated REST Ingestion

Problem: Fetch all records from a REST API that intermittently rate-limits, uses opaque cursor tokens, and occasionally drops connections.

What apiexec provides: Configure generic_rest with base_url, data_field, and next_token_field. Write a while(has_next()) loop. The engine handles retries, prefetch, cursor advancement, and termination detection. Zero retry or pagination logic in the caller.

2. Time-Windowed Metrics Backfill

Problem: Backfill 30 days of Datadog metrics. The API rate-limits aggressively on large time windows.

What apiexec provides: The datadog_metrics adapter uses a time-window cursor starting at a configured window_ms. On each 429, the engine halves the window. On success, it grows back - but not immediately after a recovery. No hand-tuned chunk sizing needed.

3. AI Completion With Cost Budget

Problem: Summarise a 100k-token document by chunking it into 4k-token prompts through GPT-4 or Claude, with a hard $5 budget limit.

What apiexec provides: The openai or anthropic adapter reports token usage via response_cost(). CostAwarePolicy tracks cumulative spend and halts cleanly with BUDGET_EXHAUSTED before the budget is exceeded. The adapter's prompt_chunk_for_cursor() splits the document at sentence boundaries.

4. Cross-Language Orchestration

Problem: A Go service orchestrates multiple data pipelines, each hitting a different vendor API. One engine, five language callers.

What apiexec provides: The ABI-stable C API is wrapped by Go, Rust, Python, Java, and JavaScript bindings. The same stream_create(adapter_name, config_json, policy_json) call dispatches through the adapter registry at runtime - no recompilation when switching adapters.

5. Embedded in a C++ Host Process

Problem: A C++ pipeline needs reliable API ingestion without pulling in a heavyweight framework.

What apiexec provides: apiexec_core is a header-only CMake INTERFACE target - zero compiled code in the core layer. Link libapiexec_capi.a or libapiexec.so directly. The entire library - engine, transport, policy, adapters - compiles to a few hundred KB with only libcurl and nlohmann/json as runtime dependencies.

Non-Goals

These are explicitly outside apiexec's scope by design:

Out of scope	Why
API gateway routing	No URL rewriting, no auth proxying - apiexec is a client-side library
Agent / workflow orchestration	No task graphs - stream ordering is the engine's concern, sequencing is the caller's
Data storage	Streams yield records; where they go is the caller's concern
Full vendor SDK replacement	Adapters are intentionally thin - use vendor SDKs when you need their full API surface

Why This Architecture

Strategy + Template Method together

The fetch sequence - check preconditions → build request → execute → handle error → parse → advance cursor → adjust policy → yield - is invariant. What changes is what happens at each variable step.

Hardcoding the sequence in the engine and delegating steps to injected strategies gives maximum reuse with minimum coupling. Adding a new adapter does not touch the engine. Swapping the policy does not touch the adapter.

Layered modules enforced at build time

The core/ layer has zero external dependencies - verified by linker output. Unit tests that use only mock transports link zero libraries beyond libc. This matters because it keeps the interface layer clean, makes the test surface small, and prevents accidentally embedding a libcurl dependency in code that is supposed to be portable.

The layer boundary is: core/ ← transport/, core/ ← policy/, core/ ← adapters/. Nothing in core/ points outward.

ABI-stable C API

Language bindings are first-class. Five bindings share one compiled implementation. If the C++ internals change - a struct gains a field, a method is renamed - the bindings do not break as long as the C API surface is unchanged. Future API additions use versioned _v{N} symbols so old binaries continue to run against a newer shared library.

Callback-based metrics and logging

Every application already has an observability stack. Embedding a Prometheus client would force a specific choice on every user. The callback pattern lets users wire in their existing counters, histograms, and log sinks with two lines of code.

Cost hooks in the core interfaces

Retrofitting cost enforcement later would have required a breaking change to the ExecutionPolicy pure-virtual interface. Adding the hooks as defaulted (non-pure) virtual methods costs nothing for existing adapters and policies - they inherit return nullopt / return false - while enabling full AI budget control for new ones.

Concepts

This page explains how the engine works, what each capability is solving, and why the library is designed the way it is. Read this before diving into the Stream API reference or Adapters page.

The Core Idea

text

Your Code
   │  next_batch()          ← you only call this
   ▼
ExecutionEngine<T>          ← fetch loop lives here
   │
   ├── VendorAdapter<T>     ← adapter knows the API shape
   ├── ExecutionPolicy      ← policy owns retry / window / budget
   └── ITransport           ← transport owns HTTP (mockable)

The Fetch Loop

The engine's inner loop is an invariant eight-step sequence. Every adapter runs through the same steps; only what happens at each step differs.

fetch_one()
│
├─ 1. check_preconditions
│      exhausted? cancelled? budget exceeded? → terminal
│
├─ 2. build_request          ← VendorAdapter
│      cursor → HTTP request (URL, headers, body)
│
├─ 3. execute_transport      ← ITransport
│      HTTP request → HTTP response
│
├─ 4. handle_error           ← Policy + Adapter
│      200 → proceed
│      429 → backoff + retry
│      5xx → backoff + retry
│      4xx (non-429) → terminal
│      parse fail → terminal
│
├─ 5. parse_response         ← VendorAdapter
│      response body → vector<T>
│
├─ 6. advance_cursor         ← VendorAdapter
│      cursor ← next_cursor(current, response)
│
├─ 7. adjust_policy          ← ExecutionPolicy
│      update window size / record cost
│
└─ 8. yield
       FetchResult<T> → caller

fetch_one() - the engine's inner loop

Reliability: Transparent Retry

The engine classifies every error response into one of two buckets before deciding what to do:

Condition	Action	Reason
429 Rate Limited	Retry with backoff + jitter	Load signal - back off and try again
5xx Server Error	Retry with backoff	Transient server-side failure
Network / Timeout	Retry with backoff	Transient connectivity failure
4xx (non-429)	Fail immediately	Auth or validation failure - retrying won't help
Parse Error	Fail immediately	Data format mismatch - not a load signal

Callers never see retries. next_batch() either returns records or a terminal error. The retry machinery is entirely internal.

Efficiency: Double-Buffer Prefetch

By default, the engine runs requests sequentially: fetch page 1, deliver it, fetch page 2, deliver it. During the caller's processing time, the network is idle.

With prefetch_depth ≥ 1, the engine overlaps network time with processing time:

text

Sequential:        Fetch 1 │ Process 1 │ Fetch 2 │ Process 2 │ Fetch 3 …
Prefetch depth=1:  Fetch 1 │ Fetch 2   │          │           │
                            │ Process 1 │ Fetch 3  │           │
                                         Process 2  …

Measured throughput improvement: ~47% on a benchmark of 50 pages × 100 records with 10 ms server latency and 10 ms processing time (sequential: 1 083 ms, prefetch: 575 ms).

Memory is bounded. Two pre-allocated batch buffers (current + prefetch) plus a 16 MB HTTP response cap. There is no unbounded queue, no growing buffer.

The prefetch depth is controlled by ExecutionPolicy::prefetch_depth(). Setting it to 0 disables prefetch entirely - useful when processing is slower than the network, or when debugging.

Adaptive Rate Management

The algorithm is deliberately conservative:

Event	Window adjustment
First 429 in a retry cycle	Shrink by `window_shrink_factor` (default ×0.5)
Successful fetch (normal)	Grow by `window_grow_factor` (default ×1.5), up to `max_window_ms`
Successful fetch after a recovered 429	No growth - the pressure was real
4xx or 5xx	No change - not a load signal

This is most visible with the datadog_metrics adapter, which starts with a 1-hour window and automatically subdivides it when the Datadog API pushes back - without the caller doing any tuning.

Cost Enforcement

AI APIs bill by token. The engine has built-in hooks for tracking and halting on cost, available to any adapter:

cpp

// VendorAdapter interface - optional, defaults to nullopt
virtual std::optional<double> response_cost(const Response& resp) const;

// ExecutionPolicy interface - optional, defaults to no-op / false
virtual void record_cost(const Cursor& cursor, double cost_units);
virtual std::optional<double> remaining_budget() const;
virtual bool budget_exceeded() const;

CostAwarePolicy is the concrete implementation. It tracks tokens × price_per_token, warns at 80% of budget, and halts the engine before the next fetch when the budget cap is reached.

The openai and anthropic adapters report token usage via response_cost() by reading the usage.total_tokens field from the API response.

Observability

Metrics

The engine maintains 11 atomic counters and gauges, readable at any time via metrics_snapshot():

Metric	Type	What it counts
`request_count`	Counter	Total HTTP requests sent
`retry_count`	Counter	Total retries (429 + 5xx + network)
`success_count`	Counter	Requests that returned 200
`error_rate_limit`	Counter	Terminal 429 errors
`error_server`	Counter	Terminal 5xx errors
`error_client`	Counter	Terminal 4xx (non-429) errors
`error_network`	Counter	Terminal network/timeout errors
`error_parse`	Counter	Terminal parse failures
`records_total`	Gauge	Cumulative records delivered to caller
`window_size_ms`	Gauge	Current time window (time-windowed adapters)
`cumulative_cost`	Gauge	Cumulative cost units consumed

Metrics export to Prometheus text format via metrics.to_prometheus(prefix) - ready to scrape from any /metrics endpoint with no extra dependencies.

Structured Logging

Every error path emits a LogEntry with:

Log level (DEBUG, INFO, WARN, ERROR)
Human-readable message
HTTP status code
Retry count at the time of the event
Redacted cursor state

API keys and secrets never appear in log output. The auth_header config field is used to build requests but is never written to a log entry.

Callback-Based Integration

Both metrics and logging are exposed as callbacks set on the engine:

cpp

engine.set_metrics_callback([](const MetricsSnapshot& snap) {
    statsd_gauge("stream.records", snap.records_total);
});

engine.set_log_callback([](const LogEntry& e) {
    spdlog::log(to_spdlog_level(e.level), e.message);
});

The callback pattern lets applications plug in their existing Prometheus, StatsD, OpenTelemetry, or structured logging stack without apiexec embedding any of them.

Streaming Model

The engine supports two iteration modes:

Batch Mode

next_batch() returns a FetchResult<T> containing a vector of records. This is the standard mode for paginated REST APIs where the server delivers a full page per response.

cpp

while (engine.has_next()) {
    auto result = engine.next_batch();
    if (result.error != StreamErrorCode::OK) break;
    for (auto& record : result.records) {
        process(record);
    }
}

Record-Level Mode

stream(callback) invokes a callback for each individual record. Intended for SSE or AI streaming APIs where responses arrive as a stream of tokens or events rather than a complete page.

cpp

engine.stream([](const T& record) {
    handle_record(record);
    return true; // return false to stop early
});

Use Cases

1. Paginated REST Ingestion

Problem: Fetch all records from a REST API that intermittently rate-limits, uses opaque cursor tokens, and occasionally drops connections.

2. Time-Windowed Metrics Backfill

Problem: Backfill 30 days of Datadog metrics. The API rate-limits aggressively on large time windows.

3. AI Completion With Cost Budget

Problem: Summarise a 100k-token document by chunking it into 4k-token prompts through GPT-4 or Claude, with a hard $5 budget limit.

4. Cross-Language Orchestration

Problem: A Go service orchestrates multiple data pipelines, each hitting a different vendor API. One engine, five language callers.

5. Embedded in a C++ Host Process

Problem: A C++ pipeline needs reliable API ingestion without pulling in a heavyweight framework.

Non-Goals

These are explicitly outside apiexec's scope by design:

Out of scope	Why
API gateway routing	No URL rewriting, no auth proxying - apiexec is a client-side library
Agent / workflow orchestration	No task graphs - stream ordering is the engine's concern, sequencing is the caller's
Data storage	Streams yield records; where they go is the caller's concern
Full vendor SDK replacement	Adapters are intentionally thin - use vendor SDKs when you need their full API surface

Why This Architecture

Strategy + Template Method together

Layered modules enforced at build time

The layer boundary is: core/ ← transport/, core/ ← policy/, core/ ← adapters/. Nothing in core/ points outward.