Substation
All applications that implement event-driven ingest, transform, and load are named substation
with the only differences between them being how and where they run.
These applications are designed for tight and loose coupling:
- ingest: tightly coupled -- configurable, specific to each application
- transform: loosely coupled -- configurable, identical across all applications
- load: loosely coupled -- configurable, identical across all applications
Configuration
Transform and load (sink) share this configuration schema:
local sink = import 'sink.libsonnet';
{
// choose the sink pattern
sink: sink.stdout
// choose the transform pattern
transform: {
type: 'batch',
settings: {
processors:
// + foo.processors
// + bar.processors
// + baz.processors
},
},
}
Use these recipes as a guide for building these configurations:
💡
Building Data Transfer Configurations
Open Recipe
💡
Building Data Transform Configurations
Open Recipe
Environment Variables
Environment variables control runtime settings for each application:
Environment Variable | Type | Description |
---|---|---|
SUBSTATION_CONFIG | string | location of a file containing a Substation configuration. must be one of: - path on local disk - HTTP(S) URL - AWS S3 URL |
SUBSTATION_CONCURRENCY | integer | determines the number of concurrent transform processes. defaults to the number of logical CPUs on the host. |
SUBSTATION_DEBUG | any | determines if debug logging is enabled. defaults to no debug logging. |
SUBSTATION_SCAN_CAPACITY | string | impacts the maximum size of each token (e.g., line in a text file) that is buffered (read) by bufio scanners that are used throughout the system to read files. read more about capacity in the bufio package's documentation. defaults to the value provided by the bufio package (approximately 65.5 KB). |
SUBSTATION_SCAN_METHOD | string | impacts the read behavior of bufio scanners that are used throughout the system to read files. must be one of: - bytes - text (default) |
SUBSTATION_METRICS | string | determines the destination for internal application metrics. must be one of: - AWS_CLOUDWATCH_EMBEDDED_METRICS: writes the CloudWatch Embedded Metrics Format to standard output defaults to none (no metrics are generated). |
Cloud Service Configuration & Credentials
Access to services hosted in each cloud service provider depends on the provider's best practices:
Updated over 1 year ago