Key-Value Stores

Key-Value Stores provide functions that retrieve and store data within and outside of Substation.

Key-Value Stores (KV) support several use cases, including:

  • Local or remote data caching with time-to-live
  • Cross- and intra-dataset field correlation
  • External enrichment from CSV, JSON, MMDB or text files
  • Indicator matching

KV are accessed using the enrich_kv_store_get and enrich_kv_store_set transforms.

aws.dynamodb

AWS DynamoDB is a read-write KV that is backed by an AWS DynamoDB table.

Settings

FieldTypeDescriptionRequired
table_namestringThe DynamoDB table that items are retrieved from and written to.Yes
attributes.partition_keystringThe table's partition key attribute.Yes
attributes.valuestringThe table attribute where values are stored.Yes
attributes.sort_keystringThe table's sort (range) key attribute.

This is required if the table uses a composite primary key schema (partition key and sort key). Only string types are supported.
No
attributes.ttlstringThe table's time-to-live attribute.No
consistent_readbooleanSpecifies whether or not to use strongly consistent reads.

Defaults to false (uses eventually consistent reads).
No

csv_file

CSV File is a read-only KV that is derived from a CSV file and stored in memory.

Rows from the CSV are identified by column and stored in a JSON object where the value from the column becomes the key and the remaining values from the row become the value.

For example, given the file content below and setting the column to "bar," the data is mapped to this structure:

foo,bar,baz
qux,quux,corge
grault,garply,waldo
fred,plugh,xyzzy
{"garply":{"baz":"waldo","foo":"grault"},"plugh":{"baz":"xyzzy","foo":"fred"},"quux":{"baz":"corge","foo":"qux"}}

Settings

FieldTypeDescriptionRequired
filestringThe location of the CSV file.

This can be either a path on local disk, an HTTP(S) URL, or an AWS S3 URL.
Yes
columnstringThe column name that is used as keys in the store.Yes
delimiterstringThe delimiting character (e.g., comma, tab) that separates values in rows in the CSV file.

Defaults to comma (,).
No
headerstringOverrides the header in the CSV file.

No default (the first line of the CSV file is used as the header).
No

json_file

JSON File is a read-only KV that is derived from a file containing a JSON object and stored in memory.

Settings

FieldTypeDescriptionRequired
filestringThe location of the JSON file.

This can be either a path on local disk, an HTTP(S) URL, or an AWS S3 URL.
Yes
is_linesbooleanIndicates that the file is a JSON Lines file. The first non-null value is returned when a key is found.No

mmdb

MMDB is a read-only KV that is derived from any MaxMind database format file.

MMDB is an open source database file format that maps IPv4 and IPv6 addresses to data records, and is most commonly utilized by MaxMind GeoIP databases.

Settings

FieldTypeDescriptionRequired
file stringThe location of the MMDB file.

This can be either a path on local disk, an HTTP(S) URL, or an AWS S3 URL.
Yes

memory

Memory is a read-write KV that is stored in memory, uses least recently used (LRU) eviction, and optionally supports per-value time-to-live.

Settings

FieldTypeDescriptionRequired
capacityintegerLimits the maximum capacity of the store.

Defaults to 1024.
No

text_file

Text File is a read-only KV that is derived from a newline delimited text file and stored in memory.

Rows from the text file are stored in an array where each element becomes the key and the value is a boolean true.

For example, given the file content below, the data is mapped to this structure:

a  
b  
c
{"a":true,"b":true,"c":true}

Settings

FieldTypeDescriptionRequired
file stringThe location of the text file.

This can be either a path on local disk, an HTTP(S) URL, or an AWS S3 URL.
Yes

Use Cases

Networked Cache-Aside

Use the aws.dynamodb KV store to create a networked cache-aside pattern that can significantly reduce data transformation time caused by high latency enrichment transforms..

Internal Passive DNS

Use the aws.dynamodb KV store, log sources that contain DNS metadata (e.g., Zeek DNS, Suricata DNS, EDR DNS events), and the DNS enrichment transforms can create an internally curated, enterprise-scale passive DNS database.

Zeek Threat Intelligence

Use the csv_file KV store and Zeek threat intelligence files to load operational threat intelligence into the platform and enable indicator matching for structured data. Critical Path Security offers several open source intel feeds.

Emerging Threats Compromised IP Addresses

Use the text_file KV store and Proofpoint's Emerging Threats continually updated list of known compromised IP addresses to enable indicator matching for structured data.

MaxMind GeoLite2 Databases

Use the mmdb KV store and MaxMind's free geolocation databases to enrich public IP addresses. These databases include city, country, and autonomous system (AS) information. Refer to this example in the project repository.