API Tools — streaming infrastructure for reliable API and AI data

Overview

apiexec exposes runtime metrics for observability. The interface is callback-based rather than embedded: the engine maintains thread-safe atomic counters and invokes a user-supplied callback after every successful fetch, passing a read-only MetricsSnapshot. This keeps the library footprint small and lets applications route metrics to whichever backend they already use - Prometheus, StatsD, Datadog, OpenTelemetry, or a simple log line.

Thread safety: All counters are std::atomic; snapshot() is lock-free.
Overhead: Incrementing a counter is a single atomic add. A snapshot is 11 atomic loads. The prefetch benchmark shows identical throughput with metrics enabled vs. disabled.
Header: source/core/metrics.hpp - no external dependencies.

Available metrics

Metric	Type	Description	Incremented when
`request_count`	counter (int64)	Total HTTP requests made	Every call to `transport->execute()`
`retry_count`	counter (int64)	Total retry attempts (429 + 5xx + network)	Every time `handle_error` decides to retry
`success_count`	counter (int64)	Successful fetches (response parsed OK)	After `parse_response()` returns true
`records_total`	counter (int64)	Batches delivered to the caller	Once per successful batch
`error_rate_limit`	counter (int64)	Terminal 429 errors after retries exhausted	`STREAM_ERROR_RATE_LIMIT` returned to caller
`error_server`	counter (int64)	Terminal 5xx errors after retries exhausted	`STREAM_ERROR_SERVER` returned to caller
`error_client`	counter (int64)	Terminal 4xx errors (non-429, no retry)	`STREAM_ERROR_CLIENT` returned to caller
`error_network`	counter (int64)	Terminal network errors after retries exhausted	`STREAM_ERROR_NETWORK` returned to caller
`error_parse`	counter (int64)	Response parse failures	`parse_response()` returns false
`window_size_ms`	gauge (double)	Current cursor time window size in ms	After every cursor advancement
`cumulative_cost`	gauge (double)	Cumulative cost units reported by the adapter	After every `adapter->response_cost()` call

What is NOT exposed

Per-adapter labels - metrics are per-stream. For per-adapter aggregation, create one stream per adapter and aggregate externally.
Latency histograms - the library does not measure request duration. Track this in your callback if needed.
Prefetch queue depth - prefetch is depth 0 or 1 (double-buffer); no queue to measure.

MetricsSnapshot

A read-only snapshot is returned by engine.metrics_snapshot() and passed to the callback. Snapshots are cheap (one atomic load per field) and lock-free.

cpp

struct MetricsSnapshot {
    int64_t request_count;
    int64_t retry_count;
    int64_t success_count;
    int64_t error_rate_limit;
    int64_t error_server;
    int64_t error_client;
    int64_t error_network;
    int64_t error_parse;
    int64_t records_total;
    double  window_size_ms;
    double  cumulative_cost;
};

Usage patterns

1. Poll the snapshot on demand

Query metrics at any time - for example, to log periodically or expose on a /metrics endpoint.

cpp

#include "core/engine.hpp"

ExecutionEngine<JsonBatch> engine(/* ... */);

while (engine.has_next()) {
    auto result = engine.next_batch();
    // ... process result ...
}

auto snap = engine.metrics_snapshot();
std::cout << "Requests: " << snap.request_count
          << "  Retries: " << snap.retry_count
          << "  Rate-limit errors: " << snap.error_rate_limit << "\n";

2. Callback on every successful fetch

cpp

engine.set_metrics_callback([](const apiexec::MetricsSnapshot& s) {
    // Update a Prometheus counter
    requests_total.Increment(s.request_count);
    retries_total.Increment(s.retry_count);

    // Or send to StatsD
    statsd.gauge("apiexec.window_size_ms", s.window_size_ms);
});

Keep callbacks fast

The callback runs on the thread calling next_batch() (or the prefetch thread when prefetch is enabled). Do not perform synchronous I/O inside it - forward data to a dedicated reporting thread if needed.

3. Prometheus text format

The Metrics class has a built-in Prometheus text exposition formatter:

cpp

auto prom_text = engine.metrics().to_prometheus("apiexec");
// Write prom_text to your HTTP /metrics response body

Sample output:

text

apiexec_requests_total 1250
apiexec_retries_total 47
apiexec_successes_total 1200
apiexec_errors_rate_limit_total 5
apiexec_errors_server_total 2
apiexec_errors_client_total 0
apiexec_errors_network_total 1
apiexec_errors_parse_total 0
apiexec_records_total 1200
apiexec_window_size_ms 3600000.000000
apiexec_cumulative_cost_units 0.000000

The prefix is configurable (default "apiexec"). Use a different prefix per stream if you run multiple streams in the same process.

4. Expose `/metrics` over HTTP

Pair the Prometheus formatter with any small HTTP server:

cpp

#include <httplib.h>  // or any HTTP server library

httplib::Server svr;
svr.Get("/metrics", [&engine](const auto& req, auto& res) {
    res.set_content(
        engine.metrics().to_prometheus("apiexec"),
        "text/plain; version=0.0.4"
    );
});
svr.listen("0.0.0.0", 9090);

Interpreting the metrics

Healthy stream

text

request_count   = 100
success_count   = 100
retry_count     = 0
error_*         = 0

Every request succeeds on the first try. The window may be growing toward max_window_ms.

Stream under moderate rate-limit pressure

text

request_count    = 150
success_count    = 100
retry_count      = 50
error_rate_limit = 0
window_size_ms   = 1800000   (shrunk from initial 3600000)

50 retries were needed to complete 100 successful fetches - a 50% retry rate. The window shrunk once on the first 429 and has stabilized. Consider reducing window_grow_factor or increasing min_window_ms to reduce oscillation.

Stream hitting budget cap

text

request_count    = 10
success_count    = 10
cumulative_cost  = 1000.0   (equal to budget_tokens)

Budget exhausted. The next call to next_batch() returns BUDGET_EXHAUSTED.

Stream with persistent network issues

text

request_count  = 60
success_count  = 10
retry_count    = 50
error_network  = 5          (5 batches hit max_retries)

5 batches hit max_retries and were returned to the caller as terminal network failures. Check your logging callback for the specific failure modes.

Metrics in the language bindings

C API limitation

The C API does not currently expose per-stream metrics directly. Full metric access (stream_metrics_snapshot_v2) is planned for ABI v2. In the meantime, the C API exposes stream_cost_info_v1 for budget and cost queries only.

Binding	Metrics access
C++ (direct)	`engine.metrics_snapshot()` / `engine.set_metrics_callback()` - full access
C API	`stream_cost_info_v1()` - budget/cost only
Go	`stream.CostInfo()` - budget/cost only
Rust	`stream.cost_info()` - budget/cost only
Python	Not yet exposed beyond cost
Java	Not yet exposed beyond cost
JavaScript	Not yet exposed beyond cost

If you need full metrics access from a binding, the recommended pattern is to host the engine in a C++ process, expose metrics via a /metrics HTTP endpoint, and have the binding consumer poll that endpoint.

Overview

Available metrics

Metric	Type	Description	Incremented when
`request_count`	counter (int64)	Total HTTP requests made	Every call to `transport->execute()`
`retry_count`	counter (int64)	Total retry attempts (429 + 5xx + network)	Every time `handle_error` decides to retry
`success_count`	counter (int64)	Successful fetches (response parsed OK)	After `parse_response()` returns true
`records_total`	counter (int64)	Batches delivered to the caller	Once per successful batch
`error_rate_limit`	counter (int64)	Terminal 429 errors after retries exhausted	`STREAM_ERROR_RATE_LIMIT` returned to caller
`error_server`	counter (int64)	Terminal 5xx errors after retries exhausted	`STREAM_ERROR_SERVER` returned to caller
`error_client`	counter (int64)	Terminal 4xx errors (non-429, no retry)	`STREAM_ERROR_CLIENT` returned to caller
`error_network`	counter (int64)	Terminal network errors after retries exhausted	`STREAM_ERROR_NETWORK` returned to caller
`error_parse`	counter (int64)	Response parse failures	`parse_response()` returns false
`window_size_ms`	gauge (double)	Current cursor time window size in ms	After every cursor advancement
`cumulative_cost`	gauge (double)	Cumulative cost units reported by the adapter	After every `adapter->response_cost()` call

What is NOT exposed

Per-adapter labels - metrics are per-stream. For per-adapter aggregation, create one stream per adapter and aggregate externally.
Latency histograms - the library does not measure request duration. Track this in your callback if needed.
Prefetch queue depth - prefetch is depth 0 or 1 (double-buffer); no queue to measure.

MetricsSnapshot

A read-only snapshot is returned by engine.metrics_snapshot() and passed to the callback. Snapshots are cheap (one atomic load per field) and lock-free.

cpp

struct MetricsSnapshot {
    int64_t request_count;
    int64_t retry_count;
    int64_t success_count;
    int64_t error_rate_limit;
    int64_t error_server;
    int64_t error_client;
    int64_t error_network;
    int64_t error_parse;
    int64_t records_total;
    double  window_size_ms;
    double  cumulative_cost;
};

Usage patterns

1. Poll the snapshot on demand

Query metrics at any time - for example, to log periodically or expose on a /metrics endpoint.

cpp

#include "core/engine.hpp"

ExecutionEngine<JsonBatch> engine(/* ... */);

while (engine.has_next()) {
    auto result = engine.next_batch();
    // ... process result ...
}

auto snap = engine.metrics_snapshot();
std::cout << "Requests: " << snap.request_count
          << "  Retries: " << snap.retry_count
          << "  Rate-limit errors: " << snap.error_rate_limit << "\n";

2. Callback on every successful fetch

cpp

engine.set_metrics_callback([](const apiexec::MetricsSnapshot& s) {
    // Update a Prometheus counter
    requests_total.Increment(s.request_count);
    retries_total.Increment(s.retry_count);

    // Or send to StatsD
    statsd.gauge("apiexec.window_size_ms", s.window_size_ms);
});

Keep callbacks fast

3. Prometheus text format

The Metrics class has a built-in Prometheus text exposition formatter:

cpp

auto prom_text = engine.metrics().to_prometheus("apiexec");
// Write prom_text to your HTTP /metrics response body

Sample output:

text

apiexec_requests_total 1250
apiexec_retries_total 47
apiexec_successes_total 1200
apiexec_errors_rate_limit_total 5
apiexec_errors_server_total 2
apiexec_errors_client_total 0
apiexec_errors_network_total 1
apiexec_errors_parse_total 0
apiexec_records_total 1200
apiexec_window_size_ms 3600000.000000
apiexec_cumulative_cost_units 0.000000

The prefix is configurable (default "apiexec"). Use a different prefix per stream if you run multiple streams in the same process.

4. Expose `/metrics` over HTTP

Pair the Prometheus formatter with any small HTTP server:

cpp

#include <httplib.h>  // or any HTTP server library

httplib::Server svr;
svr.Get("/metrics", [&engine](const auto& req, auto& res) {
    res.set_content(
        engine.metrics().to_prometheus("apiexec"),
        "text/plain; version=0.0.4"
    );
});
svr.listen("0.0.0.0", 9090);

Interpreting the metrics

Healthy stream

text

request_count   = 100
success_count   = 100
retry_count     = 0
error_*         = 0

Every request succeeds on the first try. The window may be growing toward max_window_ms.

Stream under moderate rate-limit pressure

text

request_count    = 150
success_count    = 100
retry_count      = 50
error_rate_limit = 0
window_size_ms   = 1800000   (shrunk from initial 3600000)

Stream hitting budget cap

text

request_count    = 10
success_count    = 10
cumulative_cost  = 1000.0   (equal to budget_tokens)

Budget exhausted. The next call to next_batch() returns BUDGET_EXHAUSTED.

Stream with persistent network issues

text

request_count  = 60
success_count  = 10
retry_count    = 50
error_network  = 5          (5 batches hit max_retries)

5 batches hit max_retries and were returned to the caller as terminal network failures. Check your logging callback for the specific failure modes.

Metrics in the language bindings

C API limitation

Binding	Metrics access
C++ (direct)	`engine.metrics_snapshot()` / `engine.set_metrics_callback()` - full access
C API	`stream_cost_info_v1()` - budget/cost only
Go	`stream.CostInfo()` - budget/cost only
Rust	`stream.cost_info()` - budget/cost only
Python	Not yet exposed beyond cost
Java	Not yet exposed beyond cost
JavaScript	Not yet exposed beyond cost

Metrics

Overview

Available metrics

What is NOT exposed

MetricsSnapshot

Usage patterns

1. Poll the snapshot on demand

2. Callback on every successful fetch

3. Prometheus text format

4. Expose /metrics over HTTP

Interpreting the metrics

Healthy stream

Stream under moderate rate-limit pressure

Stream hitting budget cap

Stream with persistent network issues

Metrics in the language bindings

Metrics

Overview

Available metrics

What is NOT exposed

MetricsSnapshot

Usage patterns

1. Poll the snapshot on demand

2. Callback on every successful fetch

3. Prometheus text format

4. Expose /metrics over HTTP

Interpreting the metrics

Healthy stream

Stream under moderate rate-limit pressure

Stream hitting budget cap

Stream with persistent network issues

Metrics in the language bindings

4. Expose `/metrics` over HTTP

4. Expose `/metrics` over HTTP