Stream Storage Architecture
KalamDB stream storage is the runtime path behind STREAM tables. It is optimized for short-lived,
user-scoped rows such as presence, typing indicators, cursor positions, and transient events.
Unlike USER and SHARED tables, stream rows do not flush into Parquet or participate in the
manifest-driven cold tier. The engine keeps stream data in append-only log files sized for fast TTL
eviction.
For the SQL creation model and table-type rules, see /docs/server/architecture/table-types.
Stream Storage File Layout
With the default data directory, each stream table gets a table-scoped base path under:
data/streams/<namespace>/<table>Inside that base path, rows are written into TTL-sized window directories:
data/streams/<namespace>/<table>/w<window_start_ms>-<duration_ms>/<shard>/<user_id>.logExample:
data/streams/chat/typing_events/├── w1748343000000-60000/│ ├── shard-0/alice.log│ └── shard-2/bob.log└── w1748343060000-60000/ └── shard-0/alice.logThe layout is intentionally simple:
- one immutable time window directory per retention bucket
- one shard subdirectory to distribute active writers
- one
<user_id>.logfile per user inside that shard/window
KalamDB uses <user_id>.log directly instead of <user_id>/records.log because the validated
user_id alphabet is already filename-safe, so the extra directory level would add metadata churn
without adding useful structure.
Why the Window Name Encodes Time
Each window directory name encodes two values:
window_start_msduration_ms
That lets cleanup decide whether a whole window is expired without opening any log file.
If window_end <= ttl_cutoff, the entire directory can be removed in one operation. This avoids a
row scan during normal retention and keeps cleanup proportional to the number of expired windows,
not the number of rows.
Bucket Sizing From TTL_SECONDS
The bucket size is derived from the table’s TTL_SECONDS value:
TTL_SECONDS | Window size |
|---|---|
<= 15 minutes | minute windows |
<= 1 day | hour windows |
<= 1 week | day windows |
<= 30 days | week windows |
> 30 days | month windows |
Short-lived streams use minute windows so eviction stays close to the actual retention period. Longer-lived streams use coarser windows to avoid creating too many tiny files.
Write Path
When a row is inserted or deleted in a STREAM table:
- the effective
user_idselects the per-user stream scope - the row timestamp selects the current time window
- the user is routed into a shard directory
- the record is appended to
<user_id>.logas a length-prefixed flexbuffers frame
The file-backed stream store keeps a bounded writer cache:
- 64 KiB buffered writes per open segment
- up to 256 cached segment writers per store
- cold writers flushed and closed when the cache exceeds the cap
The stream log does not call fsync on every append. Stream tables are intended for ephemeral
state, so the design favors low write latency and relies on the OS page cache for writeback.
Stream Storage Cleanup
TTL eviction is window-based, not row-by-row.
The stream eviction job computes a cutoff timestamp and then:
- scans window directory names under the table’s stream base path
- identifies windows whose computed
window_endis older than the cutoff - closes any cached writers under those directories
- removes the expired window directories
This is why the time-window layout matters: cleanup is mostly directory unlink work instead of file parsing or per-row filtering.
The pre-validation path also benefits from this layout. has_logs_before() can stop as soon as it
finds one expired window directory, and the job executor runs that filesystem work on Tokio’s
blocking pool so async workers are not pinned by directory traversal or removal.
Read Path
Reads remain user-scoped.
To serve a range or latest query, the stream store:
- computes the shard for the target
user_id - lists only that user’s candidate window files
- filters windows by overlap with the requested time range
- reads only those candidate log files
- applies delete records so latest state wins for each stream row ID
This keeps reads narrow even when many users share the same stream table.
What Stream Storage Is Not
Stream storage is deliberately separate from the Parquet cold tier used by USER and SHARED
tables.
- no Parquet flush path
- no manifest-driven segment lifecycle
- no long-term analytic storage guarantees
- no file-column support
If you need durable historical records or analytics, use a USER or SHARED table and the cold
storage flow documented in /docs/server/architecture/storage-tiers.
If you need short-lived realtime state that should expire automatically, STREAM storage is the
intended path.