Substation

All applications that implement event-driven ingest, transform, and load are named substation with the only differences between them being how and where they run.

These applications are designed for tight and loose coupling:

  • ingest: tightly coupled -- configurable, specific to each application
  • transform: loosely coupled -- configurable, identical across all applications
  • load: loosely coupled -- configurable, identical across all applications

Configuration

Transform and load (sink) share this configuration schema:

local sink = import 'sink.libsonnet';

{
  // choose the sink pattern
  sink: sink.stdout
  // choose the transform pattern
  transform: {
    type: 'batch',
    settings: {
      processors:
      // + foo.processors
      // + bar.processors
      // + baz.processors
    },
  },
}

Use these recipes as a guide for building these configurations:

Environment Variables

Environment variables control runtime settings for each application:

Environment VariableTypeDescription
SUBSTATION_CONFIGstringlocation of a file containing a Substation configuration.

must be one of:
- path on local disk
- HTTP(S) URL
- AWS S3 URL
SUBSTATION_CONCURRENCYintegerdetermines the number of concurrent transform processes.

defaults to the number of logical CPUs on the host.
SUBSTATION_DEBUGanydetermines if debug logging is enabled.

defaults to no debug logging.
SUBSTATION_SCAN_CAPACITYstringimpacts the maximum size of each token (e.g., line in a text file) that is buffered (read) by bufio scanners that are used throughout the system to read files. read more about capacity in the bufio package's documentation.

defaults to the value provided by the bufio package (approximately 65.5 KB).
SUBSTATION_SCAN_METHODstringimpacts the read behavior of bufio scanners that are used throughout the system to read files.

must be one of:
- bytes
- text (default)
SUBSTATION_METRICSstringdetermines the destination for internal application metrics.

must be one of:
- AWS_CLOUDWATCH_EMBEDDED_METRICS: writes the CloudWatch Embedded Metrics Format to standard output

defaults to none (no metrics are generated).

Cloud Service Configuration & Credentials

Access to services hosted in each cloud service provider depends on the provider's best practices: