API Tools — streaming infrastructure for reliable API and AI data

What is dd_stream?

dd_stream is an adaptive execution engine for reliably retrieving large datasets from APIs.

It decouples execution concerns - streaming, retries, prefetching, and flow control - from API semantics such as pagination, request construction, and response formats.

Instead of forcing each SDK or application to reimplement pagination and retry logic, dd_stream provides a reusable runtime that:

Adapts request size dynamically based on API behavior
Retries safely with backoff and rate-limit awareness
Pipelines requests to eliminate idle time
Exposes results as a continuous, memory-bounded stream

The system is vendor-agnostic by design. All API-specific logic is encapsulated in pluggable adapters, while the execution engine remains generic and reusable.

Built-in adapters support Datadog v1 and v2 APIs, page-based REST APIs, and token/cursor-based APIs. Adding a new vendor requires implementing a single interface - no changes to the engine, transport, or policy layers.

Language bindings for Python, Node.js, Go, Java, and Rust are powered by a single, stable C API surface.

The Pagination Problem

Modern APIs are not designed for large-scale data extraction. Clients frequently encounter:

Rate limits (429) and unpredictable throttling
Request size limits and timeouts
Inefficient or inconsistent pagination patterns
Lack of client-side control over execution

Most SDKs provide only basic pagination helpers, leaving developers to implement retry logic, backoff strategies, and chunking themselves - often inconsistently and incorrectly. The result is fragile data pipelines and unpredictable query behavior at scale.

dd_stream fills the gap between naive SDK pagination and full streaming platforms, enabling reliable, efficient, and consistent data retrieval without requiring backend changes.

How It Works

dd_stream is built around four composable components:

VendorAdapter - defines API-specific behavior (request building, parsing, cursor logic)
ExecutionPolicy - controls retry strategy, backoff, and adaptive chunk sizing
Transport - handles network I/O (e.g., HTTP via libcurl)
Cursor - represents pagination state (time window, page number, or token)

The execution engine orchestrates these components to provide a consistent runtime for API queries.

Under the hood, dd_stream uses a double-buffered prefetch pipeline: while the caller processes the current batch, the next request is fetched concurrently in the background. This transforms request execution from a sequential model into a pipelined one, effectively hiding network latency and improving throughput - typically achieving ~50% pipeline utilization with minimal additional memory.

text

┌─────────────────────────────────────────────────────────┐
│                  User API - Stream<T>                    │
│            next() / for_each() / iterator                │
└───────────────────────┬─────────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────────┐
│            ExecutionEngine<T>  (generic core)            │
│  • double-buffered prefetch (std::async)                 │
│  • cursor advancement loop                               │
│  • retry + backoff via ExecutionPolicy                   │
│  • cancellation + stuck-cursor detection                 │
└──────┬─────────────────┬──────────────────┬─────────────┘
       │                 │                  │
       ▼                 ▼                  ▼
VendorAdapter<T>   ExecutionPolicy     Transport
 build_request()    adjust()           send()
 parse_response()   backoff()          cancel()
 is_retryable()     prefetch_depth()
 retry_after()
 next_cursor()
       │
       ├── DatadogV1Adapter   (GET, time-window, flat DataPoint[])
       ├── DatadogV2TSAdapter (POST, timeseries, columnar)
       ├── DatadogV2ScAdapter (POST, scalar, aggregated)
       ├── PageAdapter<T>     (page-number REST pagination)
       └── TokenAdapter<T>    (opaque cursor/token pagination)
                        │
┌───────────────────────▼─────────────────────────────────┐
│           extern "C"  stream_c_api.h                    │
│     stream_create(adapter_name, config_json)            │
│  • JSON config - zero struct-per-vendor in the C API    │
│  • All language bindings use this single surface        │
└──────┬──────────┬──────────┬──────────┬─────────────────┘
   pybind11    N-API        cgo       JNI / Rust
   Python    Node.js         Go      Java / Rust

Language Support

Language	Binding	Zero-copy
Python	pybind11	Arrow C Data Interface (optional)
Node.js	N-API AsyncWorker	One copy (layout conversion)
Go	cgo	Yes (`unsafe.Pointer`)
Java	JNI	Yes (`GetPrimitiveArrayCritical`)
Rust	extern "C" FFI	One copy (field-by-field)

v1 vs v2 API

dd_stream supports both Datadog API versions through separate adapters that share the same execution engine:

Dimension	v1	v2 Timeseries	v2 Scalar
HTTP method	GET	POST	POST
Queries	1 (4 aggregations)	N + formulas	N + formulas
Time axis	epoch seconds	epoch milliseconds	none (per-chunk aggregate)
Response shape	flat `DataPoint[]`	columnar `times[] + series[]`	`columns[]{name, values[]}`
Group-by tags	-	per-series `group_tags[]`	per-column name

What is dd_stream?

dd_stream is an adaptive execution engine for reliably retrieving large datasets from APIs.

It decouples execution concerns - streaming, retries, prefetching, and flow control - from API semantics such as pagination, request construction, and response formats.

Instead of forcing each SDK or application to reimplement pagination and retry logic, dd_stream provides a reusable runtime that:

Adapts request size dynamically based on API behavior
Retries safely with backoff and rate-limit awareness
Pipelines requests to eliminate idle time
Exposes results as a continuous, memory-bounded stream

The system is vendor-agnostic by design. All API-specific logic is encapsulated in pluggable adapters, while the execution engine remains generic and reusable.

Language bindings for Python, Node.js, Go, Java, and Rust are powered by a single, stable C API surface.

The Pagination Problem

Modern APIs are not designed for large-scale data extraction. Clients frequently encounter:

Rate limits (429) and unpredictable throttling
Request size limits and timeouts
Inefficient or inconsistent pagination patterns
Lack of client-side control over execution

dd_stream fills the gap between naive SDK pagination and full streaming platforms, enabling reliable, efficient, and consistent data retrieval without requiring backend changes.

How It Works

dd_stream is built around four composable components:

VendorAdapter - defines API-specific behavior (request building, parsing, cursor logic)
ExecutionPolicy - controls retry strategy, backoff, and adaptive chunk sizing
Transport - handles network I/O (e.g., HTTP via libcurl)
Cursor - represents pagination state (time window, page number, or token)

The execution engine orchestrates these components to provide a consistent runtime for API queries.

text

┌─────────────────────────────────────────────────────────┐
│                  User API - Stream<T>                    │
│            next() / for_each() / iterator                │
└───────────────────────┬─────────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────────┐
│            ExecutionEngine<T>  (generic core)            │
│  • double-buffered prefetch (std::async)                 │
│  • cursor advancement loop                               │
│  • retry + backoff via ExecutionPolicy                   │
│  • cancellation + stuck-cursor detection                 │
└──────┬─────────────────┬──────────────────┬─────────────┘
       │                 │                  │
       ▼                 ▼                  ▼
VendorAdapter<T>   ExecutionPolicy     Transport
 build_request()    adjust()           send()
 parse_response()   backoff()          cancel()
 is_retryable()     prefetch_depth()
 retry_after()
 next_cursor()
       │
       ├── DatadogV1Adapter   (GET, time-window, flat DataPoint[])
       ├── DatadogV2TSAdapter (POST, timeseries, columnar)
       ├── DatadogV2ScAdapter (POST, scalar, aggregated)
       ├── PageAdapter<T>     (page-number REST pagination)
       └── TokenAdapter<T>    (opaque cursor/token pagination)
                        │
┌───────────────────────▼─────────────────────────────────┐
│           extern "C"  stream_c_api.h                    │
│     stream_create(adapter_name, config_json)            │
│  • JSON config - zero struct-per-vendor in the C API    │
│  • All language bindings use this single surface        │
└──────┬──────────┬──────────┬──────────┬─────────────────┘
   pybind11    N-API        cgo       JNI / Rust
   Python    Node.js         Go      Java / Rust

Language Support

Language	Binding	Zero-copy
Python	pybind11	Arrow C Data Interface (optional)
Node.js	N-API AsyncWorker	One copy (layout conversion)
Go	cgo	Yes (`unsafe.Pointer`)
Java	JNI	Yes (`GetPrimitiveArrayCritical`)
Rust	extern "C" FFI	One copy (field-by-field)

v1 vs v2 API

dd_stream supports both Datadog API versions through separate adapters that share the same execution engine:

Dimension	v1	v2 Timeseries	v2 Scalar
HTTP method	GET	POST	POST
Queries	1 (4 aggregations)	N + formulas	N + formulas
Time axis	epoch seconds	epoch milliseconds	none (per-chunk aggregate)
Response shape	flat `DataPoint[]`	columnar `times[] + series[]`	`columns[]{name, values[]}`
Group-by tags	-	per-series `group_tags[]`	per-column name

Overview

What is dd_stream?

How It Works

Language Support

v1 vs v2 API

Overview

What is dd_stream?

How It Works

Language Support

v1 vs v2 API