Clustering & High Availability
KalamDB uses a Multi-Raft architecture to scale reads, writes, and subscriptions across nodes while keeping data strongly replicated per shard.
How the Cluster Is Structured
KalamDB does not run a single Raft group for everything. It runs multiple groups:
Meta: cluster/system metadataDataUserShard(N): user-data shardsDataSharedShard(N): shared-data shards
Each group has its own leader and followers. This means leadership is distributed across the cluster, not centralized to a single node for all workloads.
Sharding is handled by a router that hashes user/table identifiers to shard groups, so users are naturally distributed across shards. This is how KalamDB scales tenant load and isolates shard-level hotspots.
A single node participates in multiple Raft groups simultaneously. One node is not limited to one group; instead, each node can be a leader for some groups and a follower for others at the same time.
Multi-Raft Cluster Diagram
How to read this diagram
- The cluster has multiple Raft groups, not one.
- Each group such as
Meta,DataUserShard(N), orDataSharedShard(N)runs its own Raft consensus. - Each group has one leader that handles writes and followers that replicate data and can serve follower-side access patterns.
- A single node participates in many groups at once, which is why Node 1, Node 2, and Node 3 appear repeatedly across the diagram.
Example:
DataUserShard(7)has its leader on Node 2.DataUserShard(12)has its leader on Node 3.
So:
- load is spread across nodes
- there is no single leader bottleneck
- multiple writes can happen in parallel across different groups
Why this matters
Instead of one cluster having one leader and turning that leader into a bottleneck, one cluster has many leaders, one per active Raft group.
That is the core scaling property of Multi-Raft in KalamDB.
Tiny mental model
user-1 -> shard 7 -> leader on Node 2
user-2 -> shard 12 -> leader on Node 3
user-3 -> shard 7 -> leader on Node 2So:
- the same user stays on the same shard for consistent writes
- different users can land on different leaders for parallel scaling
High Concurrent Connection Model
KalamDB accepts WebSocket clients on any node and manages them in a lock-free connection registry:
ConnectionsManagerusesDashMapindexes for fast concurrent connection/subscription access.- Connection lifecycle, authentication timeout, heartbeat timeout, and subscription routing are all centralized there.
- The default concurrent connection guard is high (
10,000), with configurable limits.
This allows high fan-in of realtime clients while keeping routing overhead low.
Leader/Follower Access Pattern
Clients can connect to either leader or follower nodes.
- Reads/subscription bootstrap can be served on the node the client is connected to.
- Writes are always leader-authoritative per Raft group.
- If a request lands on a follower for a write path, KalamDB forwards it to the current leader.
- If leadership changed mid-request, typed
NotLeaderhandling retries through the known leader.
This gives you flexible client routing without sacrificing consistency for writes.
Notification Forwarding Across Nodes
Live subscribers do not need to be connected to the leader node for their shard.
When data changes:
- The shard leader processes the change notification.
- The leader broadcasts forwarded notifications to follower nodes over cluster RPC.
- Each follower dispatches to subscriptions connected locally on that follower.
This is the key reason you can attach users to any node and still receive correct realtime updates for their shard data.
Sharding + HA in Practice
KalamDB shards users across multiple Raft data groups. Each shard group has its own:
- leader election
- replication log
- follower set
- failover boundary
Operationally, this gives you:
- better parallelism (many active leaders across groups)
- shard-level fault isolation
- horizontal scaling by increasing cluster nodes and shard count
- high availability from follower replicas per shard
When a node fails, Raft elects new leaders for affected groups and clients can reconnect to any healthy node.
Supported Cluster Commands
Cluster operations are surfaced through CLI meta-commands. See CLI cluster meta-commands for the operator command list.
For raw inspection without CLI formatting, query system.cluster and system.cluster_groups directly. Old CLUSTER LIST / STATUS / LS SQL forms are replaced by the CLI and system views.
CLUSTER LEAVE remains unsupported. CLUSTER TRANSFER LEADER is still routed by the backend, but some builds can report it as unsupported at runtime depending on the OpenRaft version in use.
How user data is divided across groups
KalamDB uses Multi-Raft rather than one Raft group for the whole cluster.
- One group handles metadata.
- Multiple user data groups handle user and stream data.
- The target user group is chosen by hashing
user_idinto one ofcluster.user_shardsgroups.
That means all writes for a specific user are coordinated by the same user-data group leader at a given time. The data is still replicated to that group’s followers, but the active write path for one user’s workload is not scattered across many leaders. This improves locality and reduces cross-group coordination, which is one of the reasons the cluster can stay fast as concurrency grows.
Follower insert example
Suppose user-42 is mapped to DataUserShard(7), and the client sends an insert to node 2:
kalam --url http://node-2:8080 --command "INSERT INTO app.messages (id, body) VALUES (101, 'hello')"If node 2 is a follower for DataUserShard(7), KalamDB:
- prepares and classifies the SQL once,
- resolves the target group from the table type and effective
user_id, - forwards the original SQL, params, auth header, and request id to the leader of
DataUserShard(7), - lets that leader append and replicate the write through Raft,
- returns the committed result back through the follower to the client.
So clients can write through followers, but the actual commit still happens on the correct group leader.
Shared tables today
Shared tables are currently not truly sharded. Today they route to a single shared data group.
More flexible shared-table sharding is a work in progress. The planned direction is partition-by-key so each shared table can define how rows are partitioned and which group should own them.
Configuration Knobs
In cluster mode, tune shard topology from cluster config:
cluster.user_shardscluster.shared_shards
These values determine how many independent Raft data groups exist for user and shared workloads. To view all the configurations for the Cluster setup:
Deploying a 3-Node Cluster
The quickest way to start a cluster is with Docker:
KALAMDB_JWT_SECRET="$(openssl rand -base64 32)" \
curl -sSL https://raw.githubusercontent.com/kalamstack/KalamDB/main/docker/run/cluster/docker-compose.yml | docker-compose -f - up -dBackup & Restore
-- Backup a namespace
BACKUP DATABASE myapp TO '/backups/myapp-2026-02-18';
-- Restore from backup
RESTORE DATABASE myapp FROM '/backups/myapp-2026-02-18';
-- View backup history
SHOW BACKUPS FOR DATABASE myapp;