PostgreSQL Extension Architecture
The current PostgreSQL extension is remote-only. PostgreSQL hosts the extension entrypoints, the FDW callbacks, and the local relation surface, while a running KalamDB server performs the actual execution over gRPC.
Main Components
- A PostgreSQL backend process: owns SQL parsing, planning, and the visible PostgreSQL-side relation surface.
- The
pg_kalamextension: registers_PG_init, the FDW callbacks, the DDLProcessUtilityhook, and the helper SQL functions. - A foreign server definition: stores remote connection settings such as
host,port, timeout, TLS material, and optional auth metadata. - A per-backend remote state cache: holds a reusable tonic channel, Tokio runtime, and a logical remote session id.
- KalamDB’s shared RPC listener: exposes the
PgServicesurface for scans, mutations, transactions, and direct SQL execution.
Transport and Session Model
pg_kalam talks to KalamDB through the shared RPC listener, not through the HTTP SQL API.
Current server options:
hostporttimeoutauth_headerca_certclient_certclient_key
On first use in a PostgreSQL backend and for a specific remote-server config, the extension:
- parses the foreign-server options,
- creates a current-thread Tokio runtime,
- opens a tonic channel,
- computes a logical session id in the form
pg-<backend-pid>-<config-hash>, - sends
OpenSession, and - caches that state for reuse across later statements in the same backend process.
on_proc_exit then attempts to send CloseSession for every cached remote state owned by that PostgreSQL backend.
Two important consequences follow from the current code:
- The transport is reused per PostgreSQL backend and per remote config instead of reconnecting for every statement.
- PostgreSQL current schema or
search_pathis not currently pushed into the remote session automatically, so directkalam_exec(...)statements should be schema-qualified.
DDL Path
The extension registers a ProcessUtility hook from _PG_init(). That hook is what powers the current DDL surface.
The hook intercepts:
CREATE TABLE ... USING kalamdbCREATE FOREIGN TABLEALTER FOREIGN TABLEDROP FOREIGN TABLE- top-level explicit transaction statements so the extension can coordinate remote transaction lifecycle
There are two DDL entry points:
-
CREATE TABLE ... USING kalamdbThis is the shorthand authoring path. It is currently bound to the default foreign server namekalam_server. -
CREATE FOREIGN TABLE ... SERVER <name>This is the explicit path for non-default server names and manual mirroring.
In both paths, the current relation identity is what matters:
- the PostgreSQL schema becomes the remote KalamDB namespace,
- the PostgreSQL relation name becomes the remote KalamDB table name.
The runtime scan and modify paths do not treat OPTIONS (namespace, table) as the canonical mapping source.
What the DDL hook does
For CREATE TABLE ... USING kalamdb, the extension:
- validates that
_seq,_userid, and_deletedwere not declared explicitly, - strips the PostgreSQL
USING kalamdbaccess-method marker before forwarding SQL to KalamDB, - normalizes
SERIAL,BIGSERIAL,SMALLSERIAL, and identity columns into explicit integer columns withDEFAULT SNOWFLAKE_ID(), - forwards the user-column DDL and
WITH (...)table options to KalamDB, - creates a local foreign table, and
- injects
_seq BIGINTon all tables plus_userid TEXTon user tables.
The local object becomes a foreign table after interception, which is why follow-up DDL should use ALTER FOREIGN TABLE and DROP FOREIGN TABLE.
Read Path
For SELECT statements, the FDW scan flow is:
- PostgreSQL plans the extension-managed relation scan.
pg_kalamresolves the relation identity from the local relation metadata and parses the foreign-server options.- The extension reads the current
kalam.user_idvalue, if present. - Any pending buffered writes for that table are flushed first so reads see local writes.
- The FDW builds a
ScanRequestwith physical-column projection and simple equality filters that are safe to push down. - A remote scan request is sent to KalamDB over
PgService. - The returned Arrow IPC batches are converted into PostgreSQL tuples.
Current pushdown behavior is intentionally partial:
- projection pushdown is used for physical columns,
- simple
column = constantorcolumn = $paramfilters can be pushed, - PostgreSQL still reapplies all quals locally,
ORDER BYandLIMITare not pushed down today.
_userid is synthesized locally from the current session user id. _seq comes back from the remote Arrow schema.
Write Path
For INSERT, UPDATE, and DELETE statements:
- PostgreSQL routes the operation through FDW modify callbacks.
pg_kalamresolves the target table and tenant context.- The extension converts PostgreSQL datums into DataFusion
ScalarValuepayloads. - Single-row inserts are buffered for better batching behavior inside explicit transactions.
- Multi-row inserts go through the batch insert path directly.
- Updates and deletes flush pending inserts first, then send remote mutation requests.
Single-row inserts are buffered so PostgreSQL transactions can batch work more efficiently. Pending writes are flushed before scans and before transaction completion so read-your-writes behavior stays predictable inside the PostgreSQL session.
Current user-id behavior:
_useridis only treated as an explicit tenant override forINSERT.UPDATEandDELETEuse the sessionkalam.user_idscope.- In a multi-row insert, all rows must use the same
_useridvalue.
Current row-identity behavior:
UPDATEandDELETEuse the first non-system column as the row-identity heuristic.- In practice, keep the primary key as the first user column in the table definition.
Explicit Transaction Flow
When a PostgreSQL session opens an explicit transaction with BEGIN or START TRANSACTION, the extension lazily opens a remote KalamDB transaction on the first FDW operation in that block.
The extension uses PostgreSQL xact callbacks to coordinate remote BeginTransaction, CommitTransaction, and RollbackTransaction calls.
That transaction state is observable through two server views:
system.sessionsfor the live pg bridge session and its current remote transaction metadata.system.transactionsfor the active explicit transaction itself alongside other active transaction origins.
This makes PostgreSQL transaction behavior visible without materializing extra persisted bookkeeping on the server.
Current transaction limits come from the backend coordinator:
- DDL is rejected inside explicit transactions.
- Stream-table writes are rejected inside explicit transactions.
- The documented surface is top-level
BEGIN,COMMIT, andROLLBACKblocks.
Direct Statement Passthrough
The extension also exposes a direct SQL bridge:
SELECT kalam_exec('SELECT * FROM system.tables');Use kalam_exec(...) when you want to send a statement directly to KalamDB without wrapping it in PostgreSQL relation syntax first.
Current behavior:
- it uses the default foreign server
kalam_server, - it reuses the same per-backend remote session cache as the FDW path,
- it returns a JSON array string for query results,
- it returns a JSON string status message for DDL and DML,
- it is revoked from
PUBLIC.
Tenant Context
kalam.user_id is a PostgreSQL session setting registered by the extension.
SET kalam.user_id = 'user-alice';
SELECT kalam_user_id();The session value is attached to remote requests and is the main way user-scoped reads, updates, and deletes are isolated.
kalam.user_id is registered as a PostgreSQL SUSET GUC, so your PostgreSQL role needs permission to set SUSET parameters.