Key-Value Stores
Key-Value Stores provide functions that retrieve and store data within and outside of Substation.
Key-Value Stores (KV) support several use cases, including:
- Local or remote data caching with time-to-live
- Cross- and intra-dataset field correlation
- External enrichment from CSV, JSON, MMDB or text files
- Indicator matching
KV are accessed using the enrich_kv_store_get
and enrich_kv_store_set
transforms.
aws.dynamodb
AWS DynamoDB is a read-write KV that is backed by an AWS DynamoDB table.
Settings
Field | Type | Description | Required |
---|---|---|---|
table_name | string | The DynamoDB table that items are retrieved from and written to. | Yes |
attributes.partition_key | string | The table's partition key attribute. | Yes |
attributes.value | string | The table attribute where values are stored. | Yes |
attributes.sort_key | string | The table's sort (range) key attribute. This is required if the table uses a composite primary key schema (partition key and sort key). Only string types are supported. | No |
attributes.ttl | string | The table's time-to-live attribute. | No |
consistent_read | boolean | Specifies whether or not to use strongly consistent reads. Defaults to false (uses eventually consistent reads). | No |
csv_file
CSV File is a read-only KV that is derived from a CSV file and stored in memory.
Rows from the CSV are identified by column and stored in a JSON object where the value from the column becomes the key and the remaining values from the row become the value.
For example, given the file content below and setting the column to "bar," the data is mapped to this structure:
foo,bar,baz
qux,quux,corge
grault,garply,waldo
fred,plugh,xyzzy
{"garply":{"baz":"waldo","foo":"grault"},"plugh":{"baz":"xyzzy","foo":"fred"},"quux":{"baz":"corge","foo":"qux"}}
Settings
Field | Type | Description | Required |
---|---|---|---|
file | string | The location of the CSV file. This can be either a path on local disk, an HTTP(S) URL, or an AWS S3 URL. | Yes |
column | string | The column name that is used as keys in the store. | Yes |
delimiter | string | The delimiting character (e.g., comma, tab) that separates values in rows in the CSV file. Defaults to comma ( , ). | No |
header | string | Overrides the header in the CSV file. No default (the first line of the CSV file is used as the header). | No |
json_file
JSON File is a read-only KV that is derived from a file containing a JSON object and stored in memory.
Settings
Field | Type | Description | Required |
---|---|---|---|
file | string | The location of the JSON file. This can be either a path on local disk, an HTTP(S) URL, or an AWS S3 URL. | Yes |
is_lines | boolean | Indicates that the file is a JSON Lines file. The first non-null value is returned when a key is found. | No |
mmdb
MMDB is a read-only KV that is derived from any MaxMind database format file.
MMDB is an open source database file format that maps IPv4 and IPv6 addresses to data records, and is most commonly utilized by MaxMind GeoIP databases.
Settings
Field | Type | Description | Required |
---|---|---|---|
file | string | The location of the MMDB file. This can be either a path on local disk, an HTTP(S) URL, or an AWS S3 URL. | Yes |
memory
Memory is a read-write KV that is stored in memory, uses least recently used (LRU) eviction, and optionally supports per-value time-to-live.
Settings
Field | Type | Description | Required |
---|---|---|---|
capacity | integer | Limits the maximum capacity of the store. Defaults to 1024. | No |
text_file
Text File is a read-only KV that is derived from a newline delimited text file and stored in memory.
Rows from the text file are stored in an array where each element becomes the key and the value is a boolean true.
For example, given the file content below, the data is mapped to this structure:
a
b
c
{"a":true,"b":true,"c":true}
Settings
Field | Type | Description | Required |
---|---|---|---|
file | string | The location of the text file. This can be either a path on local disk, an HTTP(S) URL, or an AWS S3 URL. | Yes |
Use Cases
Networked Cache-Aside
Use the aws.dynamodb
KV store to create a networked cache-aside pattern that can significantly reduce data transformation time caused by high latency enrichment transforms..
Internal Passive DNS
Use the aws.dynamodb
KV store, log sources that contain DNS metadata (e.g., Zeek DNS, Suricata DNS, EDR DNS events), and the DNS enrichment transforms can create an internally curated, enterprise-scale passive DNS database.
Zeek Threat Intelligence
Use the csv_file
KV store and Zeek threat intelligence files to load operational threat intelligence into the platform and enable indicator matching for structured data. Critical Path Security offers several open source intel feeds.
Emerging Threats Compromised IP Addresses
Use the text_file
KV store and Proofpoint's Emerging Threats continually updated list of known compromised IP addresses to enable indicator matching for structured data.
MaxMind GeoLite2 Databases
Use the mmdb
KV store and MaxMind's free geolocation databases to enrich public IP addresses. These databases include city, country, and autonomous system (AS) information. Refer to this example in the project repository.
Updated about 1 year ago