Stream Storage Architecture

KalamDB stream storage is the runtime path behind STREAM tables. It is optimized for short-lived, user-scoped rows such as presence, typing indicators, cursor positions, and transient events.

Unlike USER and SHARED tables, stream rows do not flush into Parquet or participate in the manifest-driven cold tier. The engine keeps stream data in append-only log files sized for fast TTL eviction.

For the SQL creation model and table-type rules, see /docs/server/architecture/table-types.

Stream Storage File Layout

With the default data directory, each stream table gets a table-scoped base path under:

TEXT

1data/streams/<namespace>/<table>

Inside that base path, rows are written into TTL-sized window directories:

TEXT

1data/streams/<namespace>/<table>/w<window_start_ms>-<duration_ms>/<shard>/<user_id>.log

Example:

TEXT

1data/streams/chat/typing_events/2├── w1748343000000-60000/3│   ├── shard-0/alice.log4│   └── shard-2/bob.log5└── w1748343060000-60000/6    └── shard-0/alice.log

The layout is intentionally simple:

one immutable time window directory per retention bucket
one shard subdirectory to distribute active writers
one <user_id>.log file per user inside that shard/window

KalamDB uses <user_id>.log directly instead of <user_id>/records.log because the validated user_id alphabet is already filename-safe, so the extra directory level would add metadata churn without adding useful structure.

Why the Window Name Encodes Time

Each window directory name encodes two values:

window_start_ms
duration_ms

That lets cleanup decide whether a whole window is expired without opening any log file.

If window_end <= ttl_cutoff, the entire directory can be removed in one operation. This avoids a row scan during normal retention and keeps cleanup proportional to the number of expired windows, not the number of rows.

Bucket Sizing From `TTL_SECONDS`

The bucket size is derived from the table’s TTL_SECONDS value:

`TTL_SECONDS`	Window size
`<= 15 minutes`	minute windows
`<= 1 day`	hour windows
`<= 1 week`	day windows
`<= 30 days`	week windows
`> 30 days`	month windows

Short-lived streams use minute windows so eviction stays close to the actual retention period. Longer-lived streams use coarser windows to avoid creating too many tiny files.

Write Path

When a row is inserted or deleted in a STREAM table:

the effective user_id selects the per-user stream scope
the row timestamp selects the current time window
the user is routed into a shard directory
the record is appended to <user_id>.log as a length-prefixed flexbuffers frame

The file-backed stream store keeps a bounded writer cache:

64 KiB buffered writes per open segment
up to 256 cached segment writers per store
cold writers flushed and closed when the cache exceeds the cap

The stream log does not call fsync on every append. Stream tables are intended for ephemeral state, so the design favors low write latency and relies on the OS page cache for writeback.

Stream Storage Cleanup

TTL eviction is window-based, not row-by-row.

The stream eviction job computes a cutoff timestamp and then:

scans window directory names under the table’s stream base path
identifies windows whose computed window_end is older than the cutoff
closes any cached writers under those directories
removes the expired window directories

This is why the time-window layout matters: cleanup is mostly directory unlink work instead of file parsing or per-row filtering.

The pre-validation path also benefits from this layout. has_logs_before() can stop as soon as it finds one expired window directory, and the job executor runs that filesystem work on Tokio’s blocking pool so async workers are not pinned by directory traversal or removal.

Read Path

Reads remain user-scoped.

To serve a range or latest query, the stream store:

computes the shard for the target user_id
lists only that user’s candidate window files
filters windows by overlap with the requested time range
reads only those candidate log files
applies delete records so latest state wins for each stream row ID

This keeps reads narrow even when many users share the same stream table.

What Stream Storage Is Not

Stream storage is deliberately separate from the Parquet cold tier used by USER and SHARED tables.

no Parquet flush path
no manifest-driven segment lifecycle
no long-term analytic storage guarantees
no file-column support

If you need durable historical records or analytics, use a USER or SHARED table and the cold storage flow documented in /docs/server/architecture/storage-tiers.

If you need short-lived realtime state that should expire automatically, STREAM storage is the intended path.

Stream Storage Architecture

Stream Storage File Layout

Why the Window Name Encodes Time

Bucket Sizing From TTL_SECONDS

Write Path

Stream Storage Cleanup

Read Path

What Stream Storage Is Not

Related Docs

Bucket Sizing From `TTL_SECONDS`