Overview
What is dd_stream?
dd_stream is an adaptive execution engine for reliably retrieving large datasets from APIs.
It decouples execution concerns - streaming, retries, prefetching, and flow control - from API semantics such as pagination, request construction, and response formats.
Instead of forcing each SDK or application to reimplement pagination and retry logic, dd_stream provides a reusable runtime that:
- Adapts request size dynamically based on API behavior
- Retries safely with backoff and rate-limit awareness
- Pipelines requests to eliminate idle time
- Exposes results as a continuous, memory-bounded stream
The system is vendor-agnostic by design. All API-specific logic is encapsulated in pluggable adapters, while the execution engine remains generic and reusable.
Built-in adapters support Datadog v1 and v2 APIs, page-based REST APIs, and token/cursor-based APIs. Adding a new vendor requires implementing a single interface - no changes to the engine, transport, or policy layers.
Language bindings for Python, Node.js, Go, Java, and Rust are powered by a single, stable C API surface.
The Pagination Problem
Modern APIs are not designed for large-scale data extraction. Clients frequently encounter:
- Rate limits (429) and unpredictable throttling
- Request size limits and timeouts
- Inefficient or inconsistent pagination patterns
- Lack of client-side control over execution
Most SDKs provide only basic pagination helpers, leaving developers to implement retry logic, backoff strategies, and chunking themselves - often inconsistently and incorrectly. The result is fragile data pipelines and unpredictable query behavior at scale.
dd_stream fills the gap between naive SDK pagination and full streaming platforms, enabling reliable, efficient, and consistent data retrieval without requiring backend changes.
How It Works
dd_stream is built around four composable components:
- VendorAdapter - defines API-specific behavior (request building, parsing, cursor logic)
- ExecutionPolicy - controls retry strategy, backoff, and adaptive chunk sizing
- Transport - handles network I/O (e.g., HTTP via libcurl)
- Cursor - represents pagination state (time window, page number, or token)
The execution engine orchestrates these components to provide a consistent runtime for API queries.
Under the hood, dd_stream uses a double-buffered prefetch pipeline: while the caller processes the current batch, the next request is fetched concurrently in the background. This transforms request execution from a sequential model into a pipelined one, effectively hiding network latency and improving throughput - typically achieving ~50% pipeline utilization with minimal additional memory.
┌─────────────────────────────────────────────────────────┐
│ User API - Stream<T> │
│ next() / for_each() / iterator │
└───────────────────────┬─────────────────────────────────┘
│
┌───────────────────────▼─────────────────────────────────┐
│ ExecutionEngine<T> (generic core) │
│ • double-buffered prefetch (std::async) │
│ • cursor advancement loop │
│ • retry + backoff via ExecutionPolicy │
│ • cancellation + stuck-cursor detection │
└──────┬─────────────────┬──────────────────┬─────────────┘
│ │ │
▼ ▼ ▼
VendorAdapter<T> ExecutionPolicy Transport
build_request() adjust() send()
parse_response() backoff() cancel()
is_retryable() prefetch_depth()
retry_after()
next_cursor()
│
├── DatadogV1Adapter (GET, time-window, flat DataPoint[])
├── DatadogV2TSAdapter (POST, timeseries, columnar)
├── DatadogV2ScAdapter (POST, scalar, aggregated)
├── PageAdapter<T> (page-number REST pagination)
└── TokenAdapter<T> (opaque cursor/token pagination)
│
┌───────────────────────▼─────────────────────────────────┐
│ extern "C" stream_c_api.h │
│ stream_create(adapter_name, config_json) │
│ • JSON config - zero struct-per-vendor in the C API │
│ • All language bindings use this single surface │
└──────┬──────────┬──────────┬──────────┬─────────────────┘
pybind11 N-API cgo JNI / Rust
Python Node.js Go Java / RustLanguage Support
| Language | Binding | Zero-copy |
|---|---|---|
| Python | pybind11 | Arrow C Data Interface (optional) |
| Node.js | N-API AsyncWorker | One copy (layout conversion) |
| Go | cgo | Yes (unsafe.Pointer) |
| Java | JNI | Yes (GetPrimitiveArrayCritical) |
| Rust | extern "C" FFI | One copy (field-by-field) |
v1 vs v2 API
dd_stream supports both Datadog API versions through separate adapters that share the same execution engine:
| Dimension | v1 | v2 Timeseries | v2 Scalar |
|---|---|---|---|
| HTTP method | GET | POST | POST |
| Queries | 1 (4 aggregations) | N + formulas | N + formulas |
| Time axis | epoch seconds | epoch milliseconds | none (per-chunk aggregate) |
| Response shape | flat DataPoint[] | columnar times[] + series[] | columns[]{name, values[]} |
| Group-by tags | - | per-series group_tags[] | per-column name |
