Storage Tiers
KalamDB uses a dual-tier storage architecture that balances write speed with query efficiency and long-term retention.
Hot Tier (RocksDB)
The hot tier handles all incoming writes with sub-millisecond latency using RocksDB column families.
Characteristics:
- ⚡ Sub-millisecond write latency
- Organized as column families per table
- Optimized for point lookups and recent data
- Data is buffered here before flushing to cold tier
Cold Tier (Parquet)
Flushed data is written to Apache Parquet files for efficient analytical queries and long-term storage.
Characteristics:
- 📊 Columnar format for efficient analytics
- High compression ratios
- Each segment tracked in
manifest.json - Supports multiple storage backends (local, S3, Azure, GCS)
End-to-End Tier Flow
Flush Policy
Tables are configured with a flush policy that determines when data moves from hot to cold tier:
CREATE TABLE app.messages (
id BIGINT PRIMARY KEY DEFAULT SNOWFLAKE_ID(),
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
) WITH (
TYPE = 'USER',
FLUSH_POLICY = 'rows:1000,interval:60'
);| Policy | Description |
|---|---|
rows:N | Flush after N rows accumulated |
interval:N | Flush every N seconds |
rows:N,interval:N | Flush on whichever threshold is hit first |
How Flushing Works (Engine Path)
The flush flow is job-driven and designed to be crash-safe:
- DML writes land in RocksDB first (hot tier).
- Table providers mark the manifest cache entry as
pending_write. STORAGE FLUSH TABLEorSTORAGE FLUSH ALLcreates background flush jobs (system.jobs).- The flush executor performs the actual migration in leader phase (cluster mode).
- For
USERandSHAREDtables, the flush job scans hot rows, keeps latest versions, and filters tombstones from cold output. - Parquet is written to a temp object (
batch-N.parquet.tmp) and then atomically renamed tobatch-N.parquet. - Manifest metadata is updated (segment stats, sequence range, schema version), then persisted.
- Flushed hot rows are removed from RocksDB and partition compaction runs to reclaim space.
Notes:
STREAMtables are not part of this flush path.STORAGE FLUSHis asynchronous; monitor status viasystem.jobs.
Flush State Machine
Manual Flush & Compaction
-- Flush a specific table
STORAGE FLUSH TABLE myapp.messages;
-- Flush all tables in a namespace
STORAGE FLUSH ALL IN myapp;
-- Compact cold storage
STORAGE COMPACT TABLE myapp.messages;
-- Check storage health
STORAGE CHECK local EXTENDED;S3 User-Data Example
Create an S3 storage and bind a USER table to it:
For MinIO-compatible local S3 setup, see MinIO (S3-Compatible).
CREATE STORAGE s3_prod
TYPE 's3'
BUCKET 'my-kalamdb-prod'
REGION 'us-east-1'
USER_TABLES_TEMPLATE 'users/{namespace}/{tableName}/{userId}'
SHARED_TABLES_TEMPLATE 'shared/{namespace}/{tableName}';
CREATE TABLE chat.messages (
id BIGINT PRIMARY KEY DEFAULT SNOWFLAKE_ID(),
conversation_id BIGINT NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
) WITH (
TYPE = 'USER',
STORAGE_ID = 's3_prod',
FLUSH_POLICY = 'rows:1000,interval:60'
);For user u_42, cold-tier objects are stored under the user template:
s3://my-kalamdb-prod/users/chat/messages/u_42/manifest.json
s3://my-kalamdb-prod/users/chat/messages/u_42/batch-0.parquet
s3://my-kalamdb-prod/users/chat/messages/u_42/batch-1.parquetIllustrative manifest excerpt:
{
"segments": [
{
"id": "batch-1.parquet",
"path": "batch-1.parquet",
"row_count": 1000,
"size_bytes": 184320,
"schema_version": 1,
"status": "committed"
}
],
"last_sequence_number": 1
}Per-User Storage Isolation
data/storage/
├── <namespace>/<tableName>/manifest.json # shared table path template
├── <namespace>/<tableName>/batch-<index>.parquet
└── <namespace>/<tableName>/<userId>/manifest.json # user table path templatePath layout is controlled by:
storage.shared_tables_template(default:{namespace}/{tableName})storage.user_tables_template(default:{namespace}/{tableName}/{userId})
Each user’s cold-tier data can still live in a separate directory. This enables:
- Trivial data export — just copy the user’s directory
- Instant deletion — remove the directory for GDPR compliance
- Independent scaling — no cross-user interference