Clustering & High Availability

KalamDB uses a Multi-Raft architecture to scale reads, writes, and subscriptions across nodes while keeping data strongly replicated per shard.

How the Cluster Is Structured

KalamDB does not run a single Raft group for everything. It runs multiple groups:

Meta: cluster/system metadata
DataUserShard(N): user-data shards
DataSharedShard(N): shared-data shards

Each group has its own leader and followers. This means leadership is distributed across the cluster, not centralized to a single node for all workloads.

Sharding is handled by a router that hashes user/table identifiers to shard groups, so users are naturally distributed across shards. This is how KalamDB scales tenant load and isolates shard-level hotspots.

A single node participates in multiple Raft groups simultaneously. One node is not limited to one group; instead, each node can be a leader for some groups and a follower for others at the same time.

Multi-Raft Cluster Diagram

How to read this diagram

The cluster has multiple Raft groups, not one.
Each group such as Meta, DataUserShard(N), or DataSharedShard(N) runs its own Raft consensus.
Each group has one leader that handles writes and followers that replicate data and can serve follower-side access patterns.
A single node participates in many groups at once, which is why Node 1, Node 2, and Node 3 appear repeatedly across the diagram.

Example:

DataUserShard(7) has its leader on Node 2.
DataUserShard(12) has its leader on Node 3.

So:

load is spread across nodes
there is no single leader bottleneck
multiple writes can happen in parallel across different groups

Why this matters

Instead of one cluster having one leader and turning that leader into a bottleneck, one cluster has many leaders, one per active Raft group.

That is the core scaling property of Multi-Raft in KalamDB.

Tiny mental model


user-1  -> shard 7  -> leader on Node 2
user-2  -> shard 12 -> leader on Node 3
user-3  -> shard 7  -> leader on Node 2

So:

the same user stays on the same shard for consistent writes
different users can land on different leaders for parallel scaling

High Concurrent Connection Model

KalamDB accepts WebSocket clients on any node and manages them in a lock-free connection registry:

ConnectionsManager uses DashMap indexes for fast concurrent connection/subscription access.
Connection lifecycle, authentication timeout, heartbeat timeout, and subscription routing are all centralized there.
The default concurrent connection guard is high (10,000), with configurable limits.

This allows high fan-in of realtime clients while keeping routing overhead low.

Leader/Follower Access Pattern

Clients can connect to either leader or follower nodes.

Reads/subscription bootstrap can be served on the node the client is connected to.
Writes are always leader-authoritative per Raft group.
If a request lands on a follower for a write path, KalamDB forwards it to the current leader.
If leadership changed mid-request, typed NotLeader handling retries through the known leader.

This gives you flexible client routing without sacrificing consistency for writes.

Notification Forwarding Across Nodes

Live subscribers do not need to be connected to the leader node for their shard.

When data changes:

The shard leader processes the change notification.
The leader broadcasts forwarded notifications to follower nodes over cluster RPC.
Each follower dispatches to subscriptions connected locally on that follower.

This is the key reason you can attach users to any node and still receive correct realtime updates for their shard data.

Sharding + HA in Practice

KalamDB shards users across multiple Raft data groups. Each shard group has its own:

leader election
replication log
follower set
failover boundary

Operationally, this gives you:

better parallelism (many active leaders across groups)
shard-level fault isolation
horizontal scaling by increasing cluster nodes and shard count
high availability from follower replicas per shard

When a node fails, Raft elects new leaders for affected groups and clients can reconnect to any healthy node.

Supported Cluster Commands

Cluster operations are surfaced through CLI meta-commands. See CLI cluster meta-commands for the operator command list.

For raw inspection without CLI formatting, query system.cluster and system.cluster_groups directly. Old CLUSTER LIST / STATUS / LS SQL forms are replaced by the CLI and system views.

CLUSTER LEAVE remains unsupported. CLUSTER TRANSFER LEADER is still routed by the backend, but some builds can report it as unsupported at runtime depending on the OpenRaft version in use.

How user data is divided across groups

KalamDB uses Multi-Raft rather than one Raft group for the whole cluster.

One group handles metadata.
Multiple user data groups handle user and stream data.
The target user group is chosen by hashing user_id into one of cluster.user_shards groups.

That means all writes for a specific user are coordinated by the same user-data group leader at a given time. The data is still replicated to that group’s followers, but the active write path for one user’s workload is not scattered across many leaders. This improves locality and reduces cross-group coordination, which is one of the reasons the cluster can stay fast as concurrency grows.

Follower insert example

Suppose user-42 is mapped to DataUserShard(7), and the client sends an insert to node 2:


kalam --url http://node-2:8080 --command "INSERT INTO app.messages (id, body) VALUES (101, 'hello')"

If node 2 is a follower for DataUserShard(7), KalamDB:

prepares and classifies the SQL once,
resolves the target group from the table type and effective user_id,
forwards the original SQL, params, auth header, and request id to the leader of DataUserShard(7),
lets that leader append and replicate the write through Raft,
returns the committed result back through the follower to the client.

So clients can write through followers, but the actual commit still happens on the correct group leader.

Shared tables today

Shared tables are currently not truly sharded. Today they route to a single shared data group.

More flexible shared-table sharding is a work in progress. The planned direction is partition-by-key so each shared table can define how rows are partitioned and which group should own them.

Configuration Knobs

In cluster mode, tune shard topology from cluster config:

cluster.user_shards
cluster.shared_shards

These values determine how many independent Raft data groups exist for user and shared workloads. To view all the configurations for the Cluster setup:

More cluster configurations overview

Deploying a 3-Node Cluster

The quickest way to start a cluster is with Docker:


KALAMDB_JWT_SECRET="$(openssl rand -base64 32)" \
curl -sSL https://raw.githubusercontent.com/kalamstack/KalamDB/main/docker/run/cluster/docker-compose.yml | docker-compose -f - up -d

Backup & Restore


-- Backup a namespace
BACKUP DATABASE myapp TO '/backups/myapp-2026-02-18';
 
-- Restore from backup
RESTORE DATABASE myapp FROM '/backups/myapp-2026-02-18';
 
-- View backup history
SHOW BACKUPS FOR DATABASE myapp;