Caching Strategies
Substation supports multiple caching deployment patterns by combining conditions, external enrichment processors, and sinks.
Application-Level Caching
All of the caching strategies described on this page can be implemented in any Substation application by using the AWS DynamoDB key-value store.
Distributed Cache
The distributed cache is implemented by connecting one or more Substation applications, usually spread across multiple pipelines in a single deployment, to a database. In this example, processors enrich data from a DynamoDB context database that is fed by an external service:
Self-Updating Cache
The self-updating cache is similar to a distributed cache, except the pipeline reads and feeds the cache using a cache aside pattern. In this example, a processor enriches data from a DynamoDB context database that is fed by a sink from the pipeline:
With this design it is important to avoid unnecessary writes to the cache -- the processor should store the results of the cache attempt (hit or miss) in each object and conditionally sink data if the attempt was a miss.
Cross-Pipeline & Cross-Dataset Enrichment
Cross-pipeline and cross-dataset enrichment is possible by combining the distributed cache and self-updating cache patterns:
With this pattern any pipeline can enrich its data with information generated in other pipelines in near real-time.
Updated over 1 year ago