Reshaping
Operations that change the shape of a series — pivot from long to wide, partition into per-group sub-series, or join two series into one wider one. Distinct from Aggregation (which collapses to fewer events) and from Eventwise transforms (which preserve the row count and only reshape per event).
This page covers eight ops, grouped by what they do:
Wide ↔ long reshape
pivotByGroup(group, value)— long-to-wide reshape on a categorical column.unpivot(group, value, mapping)— wide-to-long inverse. Sketch — not yet shipped.
Partition (fan-out / fan-in)
groupBy(col, fn?)— partition a series by a column value into N per-group sub-series.partitionBy(col)— scope stateful transforms to within each partition. Returns aPartitionedTimeSeriesview with sugar forfill/rolling/smooth/aggregate/ etc. Solves the cross-entity hazard for multi-host series.concat(seriesList)— row-append fan-in: concatenate N same-schema series, re-sorted by key. The inverse ofgroupBy.fromEvents(events, options)— build a series from an existingEvent[]array. Companion primitive toconcat.
Join two or more series by time key
join(other, options?)— exact-key pairwise join of twoTimeSeriesinto one wider series.joinMany(seriesList, options?)— N-ary join of many series at once. Static method onTimeSeries.
For per-event shape changes — select, rename, collapse,
map(newSchema, fn) — see
Eventwise transforms → Per-event-only.
Picking the right op
Reshape ops split along two axes: how many series in vs. how many out, and which dimension changes (more rows, more columns, or a fundamental long↔wide flip).
| Input | Want | Op |
|---|---|---|
| 1 series, long form | wider — one column per category value | pivotByGroup |
| 1 series, wide form | longer — one row per cell | unpivot (sketch) |
| 1 series, with categorical column | N sub-series, one per category | groupBy(col) |
| 1 series, with categorical column | N transformed results (records, scalars, …) | groupBy(col, fn) |
| 1 series, multi-entity, want a stateful transform | transform applied per entity, reassembled | partitionBy(col) |
| N series, all the same schema | 1 series with more rows | TimeSeries.concat |
| 2 series, different schemas | 1 wider series, joined on the time key | a.join(b) |
| N series, different schemas | 1 wider series, joined on the time key | TimeSeries.joinMany |
Event[] array (not a series) | 1 series | TimeSeries.fromEvents |
Three mental shortcuts:
- More rows or more columns?
concatadds rows (same schema, more events).join/joinManyadd columns (different schemas, merged on time keys). - Splitting or combining?
groupBysplits one into many.concat/join/joinManycombine many into one.pivotByGroupandunpivotkeep one series but change its shape. - Cross-entity hazard? If your series interleaves multiple entities
(host, region, device), most stateful transforms (
fill,rolling,smooth,align,diff,aggregate, …) silently mix data across entities. UsepartitionBy(col)to scope the transform.
pivotByGroup
Reshape long-form data into wide rows. Each distinct value of the
group column becomes its own column in the output schema named
{group}_{value}, holding the value column at that timestamp.
// Long: one row per (timestamp, host).
// Wide: one row per timestamp, columns per host.
const wide = long.pivotByGroup('host', 'cpu');
// schema: [time, "api-1_cpu", "api-2_cpu", ...]
The wide-row counterpart of groupBy: where groupBy
gives you N separate TimeSeries, pivotByGroup gives you
one wide TimeSeries. Pick whichever shape your downstream
code wants — groupBy for per-group transform pipelines,
pivotByGroup for chart data and column-wise operations.
Rows sharing a timestamp collapse into one output row. Cells where
a group has no event at a timestamp are undefined. Output schema
is dynamic (column names depend on runtime data) so the return
type is TimeSeries<SeriesSchema> (loosely typed) — read columns
by name out of toPoints() rows. Group values are sorted
alphabetically for stable column order.
If two events share both a timestamp and a group value, the
call throws by default. Opt-in with { aggregate: 'avg' } (or any
reducer name aggregate() accepts: 'sum', 'first', 'last',
'min', 'max', 'median', percentiles like 'p95', custom
functions, …):
long.pivotByGroup('host', 'cpu', { aggregate: 'avg' });
The aggregator's output kind must match the value column's kind —
e.g. count, unique, topN produce kind-changing reductions
and are rejected upfront with a clear error. Use aggregate()
first if you need a kind-changing reduction.
Composes with every other transform — every wide column is a regular numeric column:
// Per-host rolling smoothing on the wide series.
const smoothed = long.pivotByGroup('host', 'cpu').rolling('5m', {
'api-1_cpu': 'avg',
'api-2_cpu': 'avg',
});
// Per-host carry-forward for missing cells.
const filled = long.pivotByGroup('host', 'cpu').fill({ 'api-2_cpu': 'hold' });
Requires a time-keyed input. See Charting → Per-group wide rows for the end-to-end Recharts pipeline.
A group column containing both literal "undefined" strings and
actually-undefined values collapses both into a single
"undefined" output column. Edge case — open an issue if you hit
it.
Typed variant via declared groups
When you know the group set up front (which is true for most
dashboards even when they pretend otherwise), pass groups and the
output schema becomes literal-typed — every wide column has a known
name, every downstream transform narrows correctly:
const HOSTS = ['api-1', 'api-2'] as const;
const wide = long.pivotByGroup('host', 'cpu', { groups: HOSTS });
// wide.schema is now:
// readonly [
// { name: 'time', kind: 'time' },
// { name: 'api-1_cpu', kind: 'number', required: false },
// { name: 'api-2_cpu', kind: 'number', required: false },
// ]
// No `as never` needed — 'api-1_cpu' is a known column name:
const banded = wide.baseline('api-1_cpu', { window: '1m', sigma: 2 });
// toPoints rows narrow too:
const point = wide.toPoints()[0];
const value: number | undefined = point['api-1_cpu'];
Behavior changes when groups is supplied:
- Output column order = declaration order, not alphabetical. The declaration is the user's intent; preserving it makes column layouts stable across runs.
- Declared groups with no events still produce a column (with all-undefined cells). The schema is stable regardless of which hosts happen to have data on a given run.
- Runtime values not in
groupsthrow upfront. Strict by default; drop the option to discover groups dynamically and accept the looseTimeSeries<SeriesSchema>return type.
The two forms coexist in one method via overload — the typed
variant is opt-in via the groups option, and the untyped form
stays as the open-set discovery path.
as const is load-bearing on groupsWithout as const on the array, TypeScript widens Groups to
readonly string[], the recursive schema helper falls through, and
the typed output collapses to just the time column — downstream
baseline('api-1_cpu', ...) then fails with no specific
column-name hint. Two safe shapes:
// Inline literal — the const modifier kicks in automatically:
long.pivotByGroup('host', 'cpu', { groups: ['api-1', 'api-2'] });
// Pre-declared variable — explicit `as const`:
const HOSTS = ['api-1', 'api-2'] as const;
long.pivotByGroup('host', 'cpu', { groups: HOSTS });
const HOSTS = ['api-1', 'api-2'] (no as const) gives a widened
string[] and silently degrades the typed output. If your output
schema unexpectedly looks like [time] and nothing else, check
here first.
unpivot
This is a design sketch for a future API, not a shipped method.
Today, build the long form by hand from pivotByGroup's wide
output if you need the round-trip. The sketch lives here so the
shape can be validated against real use cases before
implementation.
The wide-to-long inverse of pivotByGroup. Each
wide column becomes one row per source row, tagged with a
group key derived from the column name.
// Wide → long, the proposed API:
const long = wide.unpivot('host', 'cpu', {
'api-1_cpu': 'api-1',
'api-2_cpu': 'api-2',
});
// schema: [time, host: string, cpu: number]
// rows expand 1 wide → N long, where N = mapping.length
The mapping object has explicit { wideColumn: groupValue } pairs —
no convention-based stripping (/^_cpu$/) so renamed-column edge
cases stay obvious. Wide columns not in the mapping pass through
unchanged into the output schema.
For now, the today's workaround using existing primitives:
const wideCols = ['api-1_cpu', 'api-2_cpu'] as const;
const groupOf = { 'api-1_cpu': 'api-1', 'api-2_cpu': 'api-2' } as const;
const longRows: Array<[number, string, number | undefined]> = [];
for (const event of wide.events) {
const ts = event.begin();
for (const col of wideCols) {
longRows.push([ts, groupOf[col], event.get(col) as number | undefined]);
}
}
longRows.sort((a, b) => a[0] - b[0] || a[1].localeCompare(b[1]));
const long = new TimeSeries({
name: wide.name,
schema: [
{ name: 'time', kind: 'time' },
{ name: 'host', kind: 'string' },
{ name: 'cpu', kind: 'number', required: false },
] as const,
rows: longRows,
});
If you need this and it's getting in the way, open an issue with your case and the sketch becomes a real implementation.
groupBy
Partition by a column value. Returns Map<string, TimeSeries<S>>,
or (with a transform callback) Map<string, T>.
const perHost = series.groupBy('host');
// Map<'api-1' | 'api-2' | ..., TimeSeries<S>>
The transform callback is the common form — avoids building
intermediate maps of TimeSeries:
const perHostAvg = series.groupBy('host', (group) =>
group.reduce({ cpu: 'avg', requests: 'sum' }),
);
// Map { 'api-1' => { cpu: 0.42, requests: 12543 }, ... }
Callback receives the group key as a second argument:
series.groupBy('host', (group, host) => ({
host,
avgCpu: group.reduce('cpu', 'avg'),
}));
Composes with every other method:
// Per-host hourly aggregation.
const hourly = series.groupBy('host', (group) =>
group.aggregate(Sequence.every('1h'), { cpu: 'avg' }),
);
// Per-host outliers.
const anomalies = series.groupBy('host', (group) =>
group.outliers('cpu', { window: '1m', sigma: 2 }),
);
Reach for groupBy when each group spawns multiple derived
columns (e.g. per-host baseline producing cpu/avg/upper/lower
per host). Reach for pivotByGroup when you want
one wide series with one column per group.
partitionBy
series.partitionBy(col) returns a PartitionedTimeSeries<S> view
that scopes stateful transforms to within each partition. Most
pond-ts stateful operators (fill, align, rolling, smooth,
baseline, outliers, diff, rate, pctChange, cumulative,
shift, aggregate) read from neighboring events when computing
each output. When a series interleaves multiple entities (host,
region, device, …), those neighbors silently cross entity boundaries:
// Multi-host series — events for several hosts interleaved by time:
// ts=0, cpu=0.5, host='a'
// ts=60, cpu=??, host='a' ← missing!
// ts=60, cpu=0.9, host='b'
// ts=120, cpu=0.7, host='a'
// Without partitioning: linear interp picks host 'b' as a neighbor.
// host 'a' at t=60 gets filled from host 'b' — wrong.
series.fill({ cpu: 'linear' });
// With partitioning: each host fills against its own events only.
series.partitionBy('host').fill({ cpu: 'linear' }).collect();
Persistent partition + .collect()
The view is persistent across chains — each sugar method returns
another PartitionedTimeSeries, so multi-step per-partition
workflows compose cleanly. Call .collect() at the end to
materialize back to a regular TimeSeries:
const cleaned = ts
.partitionBy('host')
.dedupe({ keep: 'last' }) // (when shipped — per-host)
.fill({ cpu: 'linear' }) // per-host
.rolling('5m', { cpu: 'avg', host: 'last' }) // per-host
.collect(); // back to TimeSeries<S>
Without .collect() you stay in partition view. Single-shot ops
need it too:
// Single per-host fill — explicit collect
const filled = ts.partitionBy('host').fill({ cpu: 'linear' }).collect();
The ceremony is consistent — every chain ends with .collect().
Sugar methods
// Per-host fill (linear interp uses neighbors within each host)
ts.partitionBy('host').fill({ cpu: 'linear' }).collect();
// Per-host rolling avg (window contains only same-host events)
ts.partitionBy('host').rolling('5m', { cpu: 'avg' }).collect();
// Per-host bucketed aggregation
ts.partitionBy('host')
.aggregate(Sequence.every('1m'), {
cpu: 'avg',
host: 'last', // keep host in the output for downstream filtering
})
.collect();
// Composite partitioning — by host AND region
ts.partitionBy(['host', 'region']).fill({ cpu: 'linear' }).collect();
Escape hatch: apply(fn)
For ops not exposed as sugar (or for transforms that need full
control of the per-partition pipeline), apply(fn) runs fn on
each partition and reassembles directly to a TimeSeries — no
.collect() needed. It's the terminal escape hatch.
const out = ts
.partitionBy('host')
.apply((g) => g.fill({ cpu: 'linear' }).rolling('5m', { cpu: 'avg' }));
// `out` is TimeSeries<...> — no .collect()
Use the sugar chain when you want partition state to persist;
apply when you've got a one-off composition you'd rather express
inline.
groupBy vs partitionBy
Both partition by column value. groupBy(col) returns a
Map<groupKey, TimeSeries<S>> for callers that want the per-group
sub-series explicitly (e.g. render N separate charts).
partitionBy(col).<method>(...).collect() runs the operation
per-group and reassembles back to one TimeSeries<S>. Reach for
groupBy when you want N outputs; for partitionBy when you want
one output with per-partition computation correctly scoped.
The same cross-entity hazard applies to live transforms
(LiveRollingAggregation, LiveAggregation, LiveView.window(),
live diff/rate/fill/cumulative). Per-partition support on
the live side is queued — see PLAN.md "Queued: live partitioning".
For now, snapshot to a TimeSeries and use batch partitionBy.
concat
TimeSeries.concat([s1, s2, s3]) — concatenates the events of N
same-schema TimeSeries instances and returns one wider series
with all events sorted by key. The row-append / vertical-stack
counterpart to joinMany (column-merge by key).
const combined = TimeSeries.concat([cpuLastWeek, cpuThisWeek]);
// same schema as the inputs; events from both weeks concatenated and sorted.
The fan-in primitive. groupBy(col, fn) is the
fan-out (one series → many); concat is the fan-in (many same-schema
series → one). Together they close the round-trip:
const filledByHost = series.groupBy('host', (g) =>
g.fill({ cpu: 'linear' }, { limit: 2 }),
);
// Map<string, TimeSeries<S>> — each host has its own filled subseries.
const combined = TimeSeries.concat([...filledByHost.values()]);
// Back to one TimeSeries<S>. Schema flows through unchanged; events
// re-sorted across all hosts.
Without concat, closing this loop required unwrapping events back
to row tuples and reconstructing — verbose and easy to break when
the schema has many columns. concat keeps you inside the typed
contract.
Schema match is required. All inputs must have identical
schema (column-by-column on name and kind only — required: false is intentionally not part of the structural check). Mismatches
throw upfront. For combining series with different schemas —
joining CPU metrics with memory metrics on time keys — use
joinMany instead.
Event identity is preserved. concat([a]).at(0) is the same
Event instance as a.at(0) — no clones. Tied keys preserve input
order via stable sort.
Naming. The concatenated series's name is taken from the first
input. Override with .rename(...) afterward if needed.
pondjs lineage. pondjs's TimeSeries.timeSeriesListMerge(...)
combined two cases under one call: same-schema concatenation (which
maps to TimeSeries.concat([...]) here) and different-schema
column-union (which maps to joinMany). pond-ts splits
these because the intent is meaningfully different — choose the
right one based on whether you want more rows (concat) or wider rows
(joinMany).
When events share a key. Events with the same key from
different inputs are both kept; this is row-append, not
key-deduplication. If you need same-key collapse, see the queued
dedupe work in PLAN.md or do it yourself before / after concat.
Event.mergeEvent.merge(patch) is a per-event payload merge (column-wise:
patch some fields on one event, return a new one). TimeSeries.concat
is a per-series row append (concatenate events from N series).
Different axes, different receivers.
fromEvents
TimeSeries.fromEvents(events, { schema, name }) — builds a typed
series from an array of Event instances. Companion to concat for
the rare case where you have raw events (not series) to assemble.
import { TimeSeries } from 'pond-ts';
// You have an EventForSchema<S>[] from somewhere — perhaps after
// flattening Map<host, TimeSeries<S>> values, or from a custom
// transform pipeline.
const events: EventForSchema<typeof schema>[] = [...];
const series = TimeSeries.fromEvents(events, {
schema,
name: 'reconstructed',
});
Events are sorted by key before construction — caller doesn't need to pre-sort. The schema is taken on trust; pass the same schema the events were originally produced under.
Trust contract. Unlike new TimeSeries({ schema, rows }) and
fromJSON(...), fromEvents does not validate event payloads
against the declared schema. If you pass events from a different
schema, the series builds successfully but downstream
event.get('col') will produce undefined or fail with confusing
column-not-found errors. Most callers come from groupBy(...).values()
or other pond-ts transforms and can't hit this; if you're constructing
events by hand, prefer the validating constructors.
Most callers should reach for concat first — it does the
event-spread for you. Use fromEvents when you've already got a
flat events array and don't have a list of series.
join
Exact-key pairwise join of two series with the same key kind.
const joined = thisWeek.join(lastWeek, {
type: 'outer',
onConflict: 'prefix',
prefixes: ['this_', 'last_'],
});
// joined.schema gains both sources' columns under prefixes:
// [time, this_cpu, this_requests, last_cpu, last_requests, ...]
Join types:
type | Keeps |
|---|---|
'outer' | every key from either side (default) |
'left' | every key from the left series |
'right' | every key from the right series |
'inner' | only keys present on both sides |
Conflict handling. If both series have a column with the same name, the join is ambiguous. Two ways to resolve:
// 1. error (default) — fail at compile/runtime if columns collide.
left.join(right); // throws if both have a 'cpu' column
// 2. prefix — rename colliding columns with one prefix per side.
left.join(right, {
onConflict: 'prefix',
prefixes: ['l_', 'r_'],
});
Or rename one side before joining (cleanest when the prefix convention doesn't fit):
left.join(right.rename({ cpu: 'right_cpu' }));
What "exact-key" means. A join row is emitted when the two
sides have events with identical keys (same Time /
Interval / TimeRange). For time-keyed inputs that means
identical timestamps; nothing is bridged across nearby-but-not-
identical timestamps. If your sources tick at slightly different
intervals, align both onto a common
sequence before joining:
const seq = Sequence.every('1m');
left.align(seq).join(right.align(seq));
Output value columns are always optional (required: false)
because non-'inner' joins emit undefined cells where one side
has no event for that key.
joinMany
TimeSeries.joinMany([a, b, c, ...], options?) — N-ary join.
Static method on TimeSeries. Same semantics as binary join
applied N-ary in one pass; avoids the awkward
a.join(b).join(c) chain that builds intermediate series.
const wide = TimeSeries.joinMany(
[cpu.align(seq), memory.align(seq), errors.align(seq)],
{ type: 'outer' },
);
// schema combines all three sources' value columns.
With prefixes to disambiguate same-named columns across sources:
TimeSeries.joinMany([cpuA, cpuB, cpuC], {
type: 'outer',
onConflict: 'prefix',
prefixes: ['a_', 'b_', 'c_'],
});
// schema: [time, a_cpu, b_cpu, c_cpu]
The prefix array length must equal the input series count. Same
exact-key contract as join — align first if your sources tick
on different cadences. Common use cases: feature-building (joining
many aligned metric series), reporting tables, and dashboard
pipelines that need one wide source for charting.
joinMany is also the right primitive when you start with
groupBy and want to merge per-group outputs into one
wide series:
const perHost = series.groupBy('host', (g, host) =>
g.baseline('cpu', { window: '5m', sigma: 2 }).rename({
cpu: `${host}_cpu`,
upper: `${host}_upper`,
lower: `${host}_lower`,
}),
);
const wide = TimeSeries.joinMany([...perHost.values()], { type: 'outer' });
This is the heavier sibling of pivotByGroup for cases where
each group spawns multiple derived columns.
See also
| Op | Page | Shape |
|---|---|---|
select(...cols) | Eventwise | narrow columns; row count unchanged |
rename(mapping) | Eventwise | rename columns; row count unchanged |
collapse(cols, out) | Eventwise | merge several columns into one; row count unchanged |
map(schema, fn) | Eventwise | per-event schema transform |
aggregate(seq, m) | Aggregation | collapse to fewer events on a grid |
reduce(mapping) | Aggregation | collapse whole series to one record |