Skip to Content
Architecture

PostgreSQL Extension Architecture

The current PostgreSQL extension is remote-only. PostgreSQL hosts the extension entrypoints, the FDW callbacks, and the local relation surface, while a running KalamDB server performs the actual execution over gRPC.

Main Components

  1. A PostgreSQL backend process: owns SQL parsing, planning, and the visible PostgreSQL-side relation surface.
  2. The pg_kalam extension: registers _PG_init, the FDW callbacks, the DDL ProcessUtility hook, and the helper SQL functions.
  3. A foreign server definition: stores remote connection settings such as host, port, timeout, TLS material, and optional auth metadata.
  4. A per-backend remote state cache: holds a reusable tonic channel, Tokio runtime, and a logical remote session id.
  5. KalamDB’s shared RPC listener: exposes the PgService surface for scans, mutations, transactions, and direct SQL execution.

Transport and Session Model

pg_kalam talks to KalamDB through the shared RPC listener, not through the HTTP SQL API.

Current server options:

  • host
  • port
  • timeout
  • auth_header
  • ca_cert
  • client_cert
  • client_key

On first use in a PostgreSQL backend and for a specific remote-server config, the extension:

  1. parses the foreign-server options,
  2. creates a current-thread Tokio runtime,
  3. opens a tonic channel,
  4. computes a logical session id in the form pg-<backend-pid>-<config-hash>,
  5. sends OpenSession, and
  6. caches that state for reuse across later statements in the same backend process.

on_proc_exit then attempts to send CloseSession for every cached remote state owned by that PostgreSQL backend.

Two important consequences follow from the current code:

  • The transport is reused per PostgreSQL backend and per remote config instead of reconnecting for every statement.
  • PostgreSQL current schema or search_path is not currently pushed into the remote session automatically, so direct kalam_exec(...) statements should be schema-qualified.

DDL Path

The extension registers a ProcessUtility hook from _PG_init(). That hook is what powers the current DDL surface.

The hook intercepts:

  • CREATE TABLE ... USING kalamdb
  • CREATE FOREIGN TABLE
  • ALTER FOREIGN TABLE
  • DROP FOREIGN TABLE
  • top-level explicit transaction statements so the extension can coordinate remote transaction lifecycle

There are two DDL entry points:

  1. CREATE TABLE ... USING kalamdb This is the shorthand authoring path. It is currently bound to the default foreign server name kalam_server.

  2. CREATE FOREIGN TABLE ... SERVER <name> This is the explicit path for non-default server names and manual mirroring.

In both paths, the current relation identity is what matters:

  • the PostgreSQL schema becomes the remote KalamDB namespace,
  • the PostgreSQL relation name becomes the remote KalamDB table name.

The runtime scan and modify paths do not treat OPTIONS (namespace, table) as the canonical mapping source.

What the DDL hook does

For CREATE TABLE ... USING kalamdb, the extension:

  1. validates that _seq, _userid, and _deleted were not declared explicitly,
  2. strips the PostgreSQL USING kalamdb access-method marker before forwarding SQL to KalamDB,
  3. normalizes SERIAL, BIGSERIAL, SMALLSERIAL, and identity columns into explicit integer columns with DEFAULT SNOWFLAKE_ID(),
  4. forwards the user-column DDL and WITH (...) table options to KalamDB,
  5. creates a local foreign table, and
  6. injects _seq BIGINT on all tables plus _userid TEXT on user tables.

The local object becomes a foreign table after interception, which is why follow-up DDL should use ALTER FOREIGN TABLE and DROP FOREIGN TABLE.

Read Path

For SELECT statements, the FDW scan flow is:

  1. PostgreSQL plans the extension-managed relation scan.
  2. pg_kalam resolves the relation identity from the local relation metadata and parses the foreign-server options.
  3. The extension reads the current kalam.user_id value, if present.
  4. Any pending buffered writes for that table are flushed first so reads see local writes.
  5. The FDW builds a ScanRequest with physical-column projection and simple equality filters that are safe to push down.
  6. A remote scan request is sent to KalamDB over PgService.
  7. The returned Arrow IPC batches are converted into PostgreSQL tuples.

Current pushdown behavior is intentionally partial:

  • projection pushdown is used for physical columns,
  • simple column = constant or column = $param filters can be pushed,
  • PostgreSQL still reapplies all quals locally,
  • ORDER BY and LIMIT are not pushed down today.

_userid is synthesized locally from the current session user id. _seq comes back from the remote Arrow schema.

Write Path

For INSERT, UPDATE, and DELETE statements:

  1. PostgreSQL routes the operation through FDW modify callbacks.
  2. pg_kalam resolves the target table and tenant context.
  3. The extension converts PostgreSQL datums into DataFusion ScalarValue payloads.
  4. Single-row inserts are buffered for better batching behavior inside explicit transactions.
  5. Multi-row inserts go through the batch insert path directly.
  6. Updates and deletes flush pending inserts first, then send remote mutation requests.

Single-row inserts are buffered so PostgreSQL transactions can batch work more efficiently. Pending writes are flushed before scans and before transaction completion so read-your-writes behavior stays predictable inside the PostgreSQL session.

Current user-id behavior:

  • _userid is only treated as an explicit tenant override for INSERT.
  • UPDATE and DELETE use the session kalam.user_id scope.
  • In a multi-row insert, all rows must use the same _userid value.

Current row-identity behavior:

  • UPDATE and DELETE use the first non-system column as the row-identity heuristic.
  • In practice, keep the primary key as the first user column in the table definition.

Explicit Transaction Flow

When a PostgreSQL session opens an explicit transaction with BEGIN or START TRANSACTION, the extension lazily opens a remote KalamDB transaction on the first FDW operation in that block.

The extension uses PostgreSQL xact callbacks to coordinate remote BeginTransaction, CommitTransaction, and RollbackTransaction calls.

That transaction state is observable through two server views:

  • system.sessions for the live pg bridge session and its current remote transaction metadata.
  • system.transactions for the active explicit transaction itself alongside other active transaction origins.

This makes PostgreSQL transaction behavior visible without materializing extra persisted bookkeeping on the server.

Current transaction limits come from the backend coordinator:

  • DDL is rejected inside explicit transactions.
  • Stream-table writes are rejected inside explicit transactions.
  • The documented surface is top-level BEGIN, COMMIT, and ROLLBACK blocks.

Direct Statement Passthrough

The extension also exposes a direct SQL bridge:

SELECT kalam_exec('SELECT * FROM system.tables');

Use kalam_exec(...) when you want to send a statement directly to KalamDB without wrapping it in PostgreSQL relation syntax first.

Current behavior:

  • it uses the default foreign server kalam_server,
  • it reuses the same per-backend remote session cache as the FDW path,
  • it returns a JSON array string for query results,
  • it returns a JSON string status message for DDL and DML,
  • it is revoked from PUBLIC.

Tenant Context

kalam.user_id is a PostgreSQL session setting registered by the extension.

SET kalam.user_id = 'user-alice'; SELECT kalam_user_id();

The session value is attached to remote requests and is the main way user-scoped reads, updates, and deletes are isolated.

kalam.user_id is registered as a PostgreSQL SUSET GUC, so your PostgreSQL role needs permission to set SUSET parameters.

Last updated on