The Hard Part of Realtime SQL: Snapshot First, Then Live Changes

Most realtime systems can tell you when a table changes.

That's useful, but it's usually not what an application actually needs.

Imagine a support dashboard that shows tickets assigned to the current user:

sql snippetSQL

1SELECT id, title, priority, status, updated_at2FROM app.tickets3WHERE assignee_id = $current_user4ORDER BY updated_at DESC;

A common solution is polling. Another is using CDC or a generic change feed.

Both approaches work, but they push a surprising amount of complexity onto the client:

Re-fetching data over and over
Filtering out rows that don't belong to the current user
Handling reconnects
Detecting missed updates
Keeping local state in sync

KalamDB takes a different approach.

Subscribe once, receive a consistent snapshot, then stream everything that happened after it.

Because KalamDB stores data in user-scoped tables, subscriptions don't require expensive filtering on every update. In many cases, the server only needs to answer a simple question:

Is this row newer than the last sequence id the client has seen?

That makes realtime delivery simpler, more predictable, and easier to scale.

The KalamDB approach

A client opens a WebSocket connection, authenticates, and subscribes to a SQL query:

json snippetJSON

1{2  "type": "subscribe",3  "subscription": {4    "id": "tickets",5    "sql": "SELECT id, title, status, _seq FROM app.tickets"6  }7}

The server then:

Captures a snapshot boundary.
Sends the initial result set in batches.
Streams all changes that occur after that boundary.
Allows the client to reconnect from the last sequence id it received.

The client doesn't need to poll or repeatedly execute the same query.

Why sequence ids matter

Every row in KalamDB contains a monotonic sequence id:

_seq

This sequence id is the foundation of synchronization.

In many systems, every update requires evaluating filters, permissions, and subscriptions before deciding whether a client should receive it.

In KalamDB, subscriptions are naturally scoped to user-owned tables and partitions. That means the server can often reduce the problem to:

Is _seq greater than the last sequence id the client received?

That simple comparison provides several benefits:

Less filtering
Lower CPU overhead
Reduced network traffic
Straightforward reconnect logic
Efficient replay
Predictable ordering

Snowflake-based sequence ids

KalamDB's _seq is implemented using a Snowflake-style identifier.

That means it is:

Globally sortable
Monotonically increasing
Fast to compare
Safe to generate across multiple nodes

This becomes especially important in distributed deployments.

When KalamDB runs as a cluster using KalamDB Raft, multiple nodes can generate sequence ids while preserving the ordering guarantees required for synchronization and replay.

The implementation details deserve their own post, but for realtime synchronization the important rule is simple:

Higher _seq = newer change

Clients can safely resume from a sequence id regardless of which node originally generated it.

Protocol overview

The WebSocket endpoint is:

GET /v1/ws

A typical session looks like this:

sequenceDiagram
participant Client
participant Server

Client->>Server: authenticate
Server-->>Client: auth_success

Client->>Server: subscribe
Server-->>Client: snapshot batches

Note over Server,Client: snapshot complete

Server-->>Client: live changes

Note over Client,Server: reconnect if needed

Client->>Server: subscribe(from_seq_id)
Server-->>Client: missed changes

The key idea is that the client always knows the latest sequence id it has processed.

That makes recovery straightforward.

Snapshot first, then live changes

The hardest part of any realtime system isn't opening a WebSocket.

It's maintaining consistency.

Consider this timeline:

T1: client subscribes
T2: snapshot starts
T3: row changes
T4: snapshot finishes
T5: live updates begin

Without careful coordination, the client can:

Receive duplicate data
Miss updates entirely
End up with an inconsistent view of the world

KalamDB solves this by recording a snapshot boundary:

snapshot_end_seq = current sequence

The server sends all rows up to that sequence id.

Any changes that occur after the boundary are buffered while the snapshot is being delivered.

Once the snapshot is complete:

snapshot complete
→ replay buffered changes
→ switch to live mode

The client receives a consistent view of the data without gaps or races.

Lock-free delivery

One advantage of this approach is that writes never need to stop while a snapshot is being generated.

New writes continue normally.

Changes that occur after the snapshot boundary are simply buffered and replayed once the snapshot finishes.

That gives us an important guarantee:

No write blocking
No global locks
No missed changes

The snapshot establishes a stable point in time, while sequence ids ensure every change after that point is delivered in order.

Initial loading

Large result sets are delivered in batches:

json snippetJSON

1{2  "type": "initial_data_batch",3  "rows": [4    {5      "id": 1,6      "_seq": 119907    }8  ],9  "batch_control": {10    "status": "loading",11    "has_more": true12  }13}

The client requests additional batches as needed:

json snippetJSON

1{2  "type": "next_batch"3}

This prevents large snapshots from overwhelming the connection.

Because the snapshot boundary has already been established, batches can be delivered independently without risking missed updates.

Live updates

Once the snapshot is complete:

json snippetJSON

1{2  "batch_control": {3    "status": "ready",4    "has_more": false5  }6}

The server switches to live streaming mode.

Example:

json snippetJSON

1{2  "type": "change",3  "rows": [4    {5      "id": 42,6      "_seq": 130017    }8  ]9}

Since every change includes a sequence id, the client always knows exactly where it is in the stream.

Reconnect and resume

Connections drop.

Browsers refresh.

Mobile networks disappear at the worst possible moment.

Instead of starting over, the client reconnects using the last sequence id it processed:

json snippetJSON

1{2  "type": "subscribe",3  "subscription": {4    "id": "tickets",5    "sql": "SELECT id, title, status, _seq FROM app.tickets",6    "options": {7      "from_seq_id": 130018    }9  }10}

The server sends everything newer than that sequence.

No custom replay logic. No full refresh. No guessing what was missed.

Because replay is built directly into the protocol, reconnect behavior remains deterministic even after long disconnects.

Offline-first applications

Sequence ids become even more useful when clients spend long periods offline.

Consider a mobile application:

Online
→ Receive updates
→ Go offline
→ User continues working
→ Reconnect later

Traditional synchronization systems often rely on:

Conflict tracking
Timestamp comparisons
Custom replay logic
Periodic full refreshes

With KalamDB, the client already knows the last sequence id it processed.

When connectivity returns:

Reconnect
→ Send last sequence id
→ Receive everything newer

That's it.

For offline-first applications, _seq effectively becomes a synchronization cursor.

Persist it locally and resume exactly where you left off without downloading the entire dataset again.

This model works particularly well for:

Mobile applications
Field service software
Messaging systems
Collaborative tools
Local-first experiences

Comparing KalamDB to other solutions

Polling

Polling repeatedly executes the same query:

SELECT ...
SELECT ...
SELECT ...
SELECT ...

The downsides are familiar:

Repeated database work
Higher latency
More bandwidth usage

KalamDB executes the query once and then streams changes as they happen.

PostgreSQL LISTEN/NOTIFY

LISTEN/NOTIFY is great for invalidation:

orders changed

But clients still need to:

Re-query data
Detect missed updates
Maintain local state

KalamDB delivers rows directly and provides sequence-based recovery.

CDC and Change Streams

CDC systems tell you:

Row X changed

The application still has to answer:

Do I care about Row X?

In many architectures that means filtering large streams of events before finding the few that matter.

With KalamDB's user-scoped tables, routing is much simpler. Most updates already belong to a specific user partition, so the server can focus on ordering and delivery rather than evaluating large numbers of filters.

Firebase

Firebase Realtime Database and Firestore provide excellent realtime synchronization and are among the easiest ways to build realtime applications.

Firebase clients subscribe to documents, collections, or query results and automatically receive updates when those results change.

Firebase also includes strong offline support, local caching, conflict resolution, and SDKs that handle synchronization automatically.

However, Firebase is fundamentally document-oriented rather than relational:

Data is stored as documents rather than tables and rows
Complex relational queries often require denormalization
Joins are not a native part of the data model
Query flexibility is intentionally constrained for scalability

Firebase's strength is developer productivity around document data.

KalamDB's advantage is that it keeps the relational SQL model intact:

SQL queries instead of document queries
Relational schemas instead of denormalized document trees
Sequence-based replay and recovery
Snapshot-first consistency guarantees
User-scoped routing optimized for application state

If your application naturally fits a document database, Firebase is often an excellent choice.

If your application already thinks in terms of SQL tables, relationships, and queries, KalamDB avoids the impedance mismatch of reshaping relational data into documents.

Supabase Realtime

Supabase Realtime is built on PostgreSQL logical replication.

Technically, it streams database changes from PostgreSQL's WAL (Write-Ahead Log) and delivers them over WebSockets.

For example, a client can subscribe to:

INSERT on orders
UPDATE on orders
DELETE on orders

Supabase also supports server-side filtering on replication streams and integrates with PostgreSQL Row Level Security (RLS), which significantly reduces the amount of filtering clients need to perform themselves.

That's an important distinction: modern Supabase Realtime is not simply broadcasting every table change to every client.

However, Supabase Realtime is still fundamentally change-stream based.

The primary unit of delivery is:

A database row changed

rather than:

This SQL query result changed

Applications often still need to maintain local query state, apply ordering, merge updates into cached results, and handle how row-level changes affect a particular view.

Another important difference is replay.

KalamDB assigns every change a durable monotonic sequence id (_seq) and allows clients to reconnect from a specific point in the stream:

Give me everything after sequence 13001

This makes replay a first-class protocol feature.

Supabase Realtime does not expose an equivalent durable change identifier that clients can use as a replay cursor across reconnects. Applications typically need to recover state through re-fetching data or rebuilding state from the database rather than replaying from a known change position.

KalamDB focuses on a different abstraction:

Subscribe to state
not
Subscribe to changes

Key advantages of KalamDB include:

Snapshot-first subscriptions built into the protocol
Durable sequence ids for replay and recovery
Consistent snapshot-to-live transitions
User-scoped routing with minimal subscription evaluation
SQL result synchronization rather than raw row-change delivery
Efficient replay using sequence comparisons
No need to filter large shared tables by user before replaying changes

Because KalamDB stores data in user-scoped tables, replay is particularly efficient. The server already knows which user's data is being synchronized and can simply stream changes newer than a given sequence id.

Conceptually:

User
└── Table
└── Rows

Replay becomes:

Find user's table
→ Find rows with _seq > last_seen_seq
→ Stream changes

rather than:

Scan shared table
→ Filter by user
→ Apply permissions
→ Determine missed changes
→ Stream results

In other words:

Supabase Realtime excels at exposing PostgreSQL changes to applications.

KalamDB is designed around keeping application state synchronized from the beginning, with replay and recovery built directly into the protocol.

Why user-scoped tables matter

This is one of the biggest architectural differences in KalamDB.

Many realtime systems start with:

One large table
→ filter per user
→ filter per subscription
→ filter per client

KalamDB organizes data around ownership instead.

Conceptually:

User
└── Table
└── Rows

When a row changes, the server already knows who owns it.

Routing becomes:

Find user
→ Compare sequence id
→ Deliver update

instead of:

Scan subscriptions
→ Evaluate filters
→ Check permissions
→ Route update

The same advantage applies to replay.

When a client reconnects, KalamDB doesn't need to scan a large shared table and filter rows by user ownership before determining what was missed. It can directly replay changes from the user's table using sequence ordering.

As the number of users grows, that difference becomes significant.

Developer experience

A subscription can be as simple as:

typescript snippettypescript

1const unsubscribe = await client.live(`2  SELECT id, body, created_at, _seq3  FROM chat.messages`,4  (event) => {5    console.log(event);6  }7);

The SDK handles:

Authentication
WebSocket lifecycle
Snapshot loading
Reconnects
Sequence tracking
Resume logic

Applications simply react to events.

When to use KalamDB

KalamDB works particularly well for:

Chat applications
Dashboards
Notifications
Collaborative tools
Mobile apps
Offline-first applications
User-specific realtime data

It's designed for application state, not high-volume analytics streams.

Design summary

Goal	KalamDB approach
Initial consistency	Snapshot before live updates
Reconnect safety	Resume from sequence id
Low filtering cost	User-scoped tables
Efficient routing	Route by ownership
Efficient replay	Replay from sequence ids
Reduced bandwidth	Send only relevant rows
Simple clients	SDK manages synchronization
Offline sync	Resume from persisted sequence id
Lock-free delivery	Snapshot boundary plus buffered changes
Distributed ordering	Snowflake-based sortable sequence ids
Scalable realtime	Compare sequence ids instead of re-running queries

Conclusion

Most realtime systems focus on broadcasting events.

KalamDB focuses on synchronizing state.

The core idea is simple:

Snapshot
→ Sequence id
→ Live updates
→ Resume from last sequence

The sequence id is more than a counter.

It's a globally sortable Snowflake identifier that enables reconnects, offline synchronization, distributed deployments, efficient replay, and efficient realtime delivery.

Unlike systems that primarily expose change streams, KalamDB treats replay as a first-class capability. Clients always know exactly where they are in the stream and can resume from a durable sequence id without rebuilding state from scratch.

Because data is organized around ownership, the server doesn't spend its time evaluating complex filters for every update. Instead, it can route changes efficiently and rely primarily on sequence ordering. Replay benefits from the same architecture: the server can directly locate a user's data and stream everything newer than the last acknowledged sequence id.

Combined with the snapshot-first design, this creates a lock-free synchronization model where writes continue uninterrupted while clients receive a complete, ordered stream of changes.

The result is a simpler way to think about realtime data:

Subscribe once.
Receive a snapshot.
Stay up to date.
Reconnect from the last sequence id.

That's the foundation of realtime SQL in KalamDB.

The Hard Part of Realtime SQL: Snapshot First, Then Live Changes

The KalamDB approach

Why sequence ids matter

Snowflake-based sequence ids

Protocol overview

Snapshot first, then live changes

Lock-free delivery

Initial loading

Live updates

Reconnect and resume

Offline-first applications

Comparing KalamDB to other solutions

Polling

PostgreSQL LISTEN/NOTIFY

CDC and Change Streams

Firebase

Supabase Realtime

Why user-scoped tables matter

Developer experience

When to use KalamDB

Design summary

Conclusion

Related

Why Object Storage Is Becoming the New Database Layer

KalamDB v0.5.3-rc.1 Pre-Release

While Building KalamDB, I Started Questioning Why We Still Need So Much Backend Code