Skip to main content

Partitioning

Real-world time series are usually multi-entity — events from many hosts, regions, devices, or users interleaved in one stream by time. Pond's stateful operators (rolling, aggregate, fill, diff, rate, smooth, ...) read neighbouring events when computing each output. On a multi-entity series, those "neighbours" silently cross entity boundaries — the canonical pond footgun.

partitionBy(col) is the fix: split a series by column value, operate per-partition, optionally fan back in.

The cross-partition hazard

Take this series — two hosts interleaved by time:

time | cpu | host
0 | 0.5 | api-1
0 | 0.7 | api-2
60_000 | 0.6 | api-1
60_000 | 0.8 | api-2

Run a stateful op directly on it:

series.rolling('5m', { cpu: 'avg' });
// → averages api-1 AND api-2 together. Probably not what you wanted.

series.fill({ cpu: 'linear' });
// → fills api-1's gaps using api-2's values as "neighbours."

series.diff('cpu');
// → diffs api-1's cpu against api-2's preceding cpu.

Static (per-event) ops are fine — filter, map, select, rename, collapse look at one event at a time. The hazard is specifically with operators that read across multiple events:

  • Multi-event reads: rolling, aggregate, reduce, smooth, align, cumulative
  • Adjacent-event reads: diff, rate, pctChange, shift, fill (for linear and bfill)
  • Late-event reads (live): LiveRollingAggregation, LiveAggregation, LiveView.window, plus the live versions of the above

partitionBy(col) — per-entity scope

Split the series by column value, run ops within each entity, materialise back:

series
.partitionBy('host')
.fill({ cpu: 'linear' }) // per-host fill — no cross-host neighbours
.rolling('5m', { cpu: 'avg' }) // per-host rolling — no cross-host averaging
.collect(); // → regular TimeSeries, per-host outputs interleaved by time

Each step in the chain returns a PartitionedTimeSeries view — the per-partition scope persists across chains. .collect() is what materialises back to a normal TimeSeries.

Composite partitions

Multiple partition columns combine into a tuple key:

series.partitionBy(['host', 'region']).rolling('5m', { cpu: 'avg' });
// each (host, region) combination gets its own rolling window

Order matters for output sorting (events within a partition stay in time order; partitions are visited in column-tuple order at collect time).

Live partitioning

The same hazard exists on the live side, with the same fix. live.partitionBy(col) returns a LivePartitionedSeries that routes incoming events into per-partition LiveSeries<S> sub-buffers:

const partitioned = live.partitionBy('host');

// Per-partition rolling — each host gets its own deque
const rolledByHost = partitioned.rolling('5m', { cpu: 'avg' });

// Materialise per-host current values into a Map
const byHost = partitioned.toMap();
byHost.get('api-1')?.last()?.get('cpu');

Live partitions are auto-spawned: the first time an event with a new partition value arrives, a sub-buffer is allocated for it. Pre-declare the expected set with partitionBy('host', { groups: [...] }) if you want unknown values to throw at ingest.

Per-partition retention, grace, and ordering all default from the source's settings but can be overridden:

live.partitionBy('host', {
retention: { maxEvents: 1_000 }, // per-partition cap
graceWindow: '5s',
ordering: 'reorder',
});

Synchronised partitioned rolling

Per-partition rolling fires when each partition's own events arrive — api-1 updates at api-1's pace, api-2 at api-2's. That's correct for "I want host-1's smoothed value as soon as host-1 has new data."

Sometimes you want all partitions to emit together at fixed boundaries — every host reports its current rolling-window value on the same 200ms tick. Pass a clock trigger:

const ticks = live
.partitionBy('host')
.rolling('1m', { cpu: 'avg' }, { trigger: Trigger.every('200ms') });

// ticks: LiveSource<{ time, host, cpu }>
// Each tick → one event per known partition, all sharing the same ts

The output is a flat LiveSource whose schema is [time, <partitionColumn>, ...mappingColumns]. One event per known partition per tick. Quiet partitions still emit (using their last known rolling-window state, with stale entries evicted against the tick timestamp).

This is what dashboards reach for: every host emits a coherent snapshot at a regular cadence so the renderer can group by ts and lay out a row per host. See Triggers × partitioning.

When the dashboard wants multiple windows per partition (e.g. 1m baseline + 200ms current), pass the keyed-record form instead of a single (window, mapping) pair — see Windowing → Multi-window rolling (live). One ingest pass updates all windows; one merged event per partition per tick. The shared shape:

const fused = live.partitionBy('host').rolling(
{
'1m': {
cpu_avg: { from: 'cpu', using: 'avg' },
cpu_sd: { from: 'cpu', using: 'stdev' },
},
'200ms': { cpu_samples: { from: 'cpu', using: 'samples' } },
},
{ trigger: Trigger.every('200ms') },
);

Fan-in patterns

Three ways to consume per-partition output:

PatternReturnsUse when
.collect() (batch)TimeSeries<S> with all partitions mergedBatch chains — final materialise step
.collect() (live)LiveSeries<S> (append-only fan-in)Live: unified rolling buffer per host's events
.toMap() (live)Map<key, LiveSource<S>>Direct per-partition subscription
.apply(factory)LivePartitionedView<R> of per-partition outputsChain a factory across all partitions

The synchronised-rolling output above bypasses fan-in — it returns a flat LiveSource directly because the boundary-tick contract already gives you one event per partition per tick.

What partitioning doesn't do

  • Doesn't share state across partitions. Each partition has its own rolling deque, its own bucket states, its own subscriber list. This is intentional — partitions are independent by construction, which is the whole point.
  • Doesn't auto-merge late events across partitions. A late event for api-1 does not perturb api-2's state. Live partitions inherit the source's grace window per partition.
  • Doesn't help with the tap sub-window pattern. Two windows over the same source's same partition each maintain their own deque. The parked tap primitive (in PLAN.md) addresses that.

Where this shows up

  • Batch operatorReshape → partitionBy has the operator-level reference.
  • Live operatorLivePartitionedSeries and the chainable LivePartitionedView are documented under Live transforms.
  • Triggers — synchronised partitioned rolling builds on clock triggers. See Triggers.
  • SeriespartitionBy works uniformly on both TimeSeries<S> and LiveSeries<S>. See Series.