Batching
Transforms can support batching data for efficiency and organization.
Several transforms support batching as a primary or secondary function, including aggregate and send transforms. Substation is event-driven and batches data in memory; each batch is emitted when either:
- The batch is full and cannot accept more data.
- A control message is received.
Configuring Batching Behavior
Batches are directly configurable using these options:
count
: the maximum number of items that can fit within each batch.size
: the maximum sum of bytes that can fit within each batch.duration
: the maximum amount of time to collect data within each batch.
When the threshold for any of the above is met, then the batch is emitted and a new one is created.
Organizing Data in Batches
Batches can optionally organize data by referencing a JSON value. For example:
{"a":"b","group":1}
{"c":"d","group":2}
{"e":"f","group":1}
[{"a":"b","group":1},{"e":"f","group":1}]
[{"c":"d","group":2}]
For any transform that supports batching, the value is accessed by configuring the setting object.batch_key
. In the example above, the batch key is group
. It's possible to use multiple JSON values to organize batches by referencing multiple keys (e.g., batch_key: '[foo,bar]'
) or by combining multiple values into a single value using transforms.
Updated 10 months ago