Batching

Transforms can support batching data for efficiency and organization.

Several transforms support batching as a primary or secondary function, including aggregate and send transforms. Substation is event-driven and batches data in memory; each batch is emitted when either:

  • The batch is full and cannot accept more data.
  • A control message is received.

Configuring Batching Behavior

Batches are directly configurable using these options:

  • count: the maximum number of items that can fit within each batch.
  • size: the maximum sum of bytes that can fit within each batch.
  • duration: the maximum amount of time to collect data within each batch.

When the threshold for any of the above is met, then the batch is emitted and a new one is created.

Organizing Data in Batches

Batches can optionally organize data by referencing a JSON value. For example:

{"a":"b","group":1}
{"c":"d","group":2}
{"e":"f","group":1}
[{"a":"b","group":1},{"e":"f","group":1}]
[{"c":"d","group":2}]

For any transform that supports batching, the value is accessed by configuring the setting object.batch_key. In the example above, the batch key is group. It's possible to use multiple JSON values to organize batches by referencing multiple keys (e.g., batch_key: '[foo,bar]') or by combining multiple values into a single value using transforms.