PostgreSQL Extension Data Type Conversions

This page documents the current conversion behavior in pg_kalam.

There are three distinct layers to keep straight:

PostgreSQL-side DDL authoring and logical schema mapping.
PostgreSQL datum conversion on writes.
Arrow batch materialization on reads.

The important constraint is that the canonical KalamDB logical type system is broader than the extension’s dedicated PostgreSQL runtime branches. The logical mapping exists for more types than the extension can currently materialize natively on the PostgreSQL side.

If this page and the code diverge, the code wins.

1. PostgreSQL Authoring to KalamDB Logical Types

These are the common PostgreSQL-facing type names that map into KalamDB’s logical schema model today.

PostgreSQL-side authoring type	KalamDB logical type	Arrow or storage representation	Notes
`BOOLEAN`, `BOOL`	`Boolean`	`Boolean`	Dedicated native read and write branches exist
`SMALLINT`, `INT2`	`SmallInt`	`Int16`	Dedicated native read and write branches exist
`INTEGER`, `INT`, `INT4`, `MEDIUMINT`	`Int`	`Int32`	Dedicated native read and write branches exist
`BIGINT`, `INT8`	`BigInt`	`Int64`	Dedicated native read and write branches exist
`REAL`, `FLOAT`, `FLOAT4`	`Float`	`Float32`	Dedicated native read and write branches exist
`DOUBLE PRECISION`, `DOUBLE`, `FLOAT8`, `FLOAT64`	`Double`	`Float64`	Dedicated native read and write branches exist
`TEXT`, `VARCHAR`, `CHAR`, `STRING`, `NVARCHAR`	`Text`	`Utf8`	`TEXT` and `VARCHAR` have dedicated write fast paths; `CHAR` currently uses the generic text-output fallback on writes
`BYTEA`, `BYTES`, `BINARY`, `VARBINARY`, `BLOB`	`Bytes`	`Binary`	Read path has a dedicated `BYTEA` materializer; write path currently falls back to PostgreSQL text output
`JSON`, `JSONB`	`Json`	`Utf8` containing JSON text	JSON and JSONB are first-class in the extension on both read and write
`DATE`	`Date`	`Date32`	Read path has a dedicated `DATE` materializer; write path currently falls back to PostgreSQL text output
`TIME`	`Time`	`Time64(Microsecond)`	Logical mapping exists, but read and write still rely on generic conversion fallback
`TIMESTAMP`	`Timestamp`	`Timestamp(Microsecond, None)`	Read path has a dedicated timestamp materializer; write path currently falls back to PostgreSQL text output
`NUMERIC(p,s)`, `DECIMAL(p,s)`	`Decimal { precision, scale }`	`Decimal128(p, s)`	Logical mapping exists; PostgreSQL numeric materialization is not yet a dedicated branch
`UUID`	`Uuid`	`FixedSizeBinary(16)`	Logical mapping exists; PostgreSQL UUID materialization is not yet a dedicated branch
`FILE`	`File`	`Utf8` containing serialized `FileRef` JSON	The remote KalamDB column is `FILE`; the local PostgreSQL column is rewritten to `JSONB` on the `USING kalamdb` path

Two special cases need separate callouts:

KalamDataType::DateTime is the timezone-aware logical type and stores as Timestamp(Microsecond, Some("UTC")). Its reverse PostgreSQL type name is TIMESTAMPTZ, but PostgreSQL-side timezone-aware authoring spellings should be validated against your exact build.
KalamDataType::Embedding(n) is the logical embedding type. Its reverse PostgreSQL type name is VECTOR(n), but the KalamDB logical DDL spelling is still EMBEDDING(n), so PostgreSQL-side authoring should be validated carefully instead of assuming full shorthand parity.

2. KalamDB Logical Types Back to PostgreSQL

When the extension needs to generate or mirror PostgreSQL column type names from a KalamDB logical schema, it uses the following canonical mapping.

KalamDB logical type	Generated or mirrored PostgreSQL type name
`Boolean`	`BOOLEAN`
`SmallInt`	`SMALLINT`
`Int`	`INTEGER`
`BigInt`	`BIGINT`
`Float`	`REAL`
`Double`	`DOUBLE PRECISION`
`Text`	`TEXT`
`Bytes`	`BYTEA`
`Date`	`DATE`
`Time`	`TIME`
`Timestamp`	`TIMESTAMP`
`DateTime`	`TIMESTAMPTZ`
`Uuid`	`UUID`
`Json`	`JSONB`
`File`	`JSONB`
`Decimal { precision, scale }`	`NUMERIC(p, s)`
`Embedding(n)`	`VECTOR(n)`

Current local-surface rules modify that canonical reverse mapping in a few places:

On CREATE TABLE ... USING kalamdb, most local PostgreSQL columns keep the type you declared.
JSON stays JSON locally if you declared JSON.
JSONB stays JSONB locally if you declared JSONB.
FILE is the special case: it is always rewritten locally to JSONB.

If you mirror a remote FILE column manually with CREATE FOREIGN TABLE, declare the local PostgreSQL column as JSONB.

3. DDL Normalizations Before Remote Create

The DDL hook performs a small set of PostgreSQL-specific normalizations before sending the user-column definition to KalamDB.

PostgreSQL authoring form	Remote KalamDB DDL result
`SERIAL`	Integer column plus `DEFAULT SNOWFLAKE_ID()`
`BIGSERIAL`	Bigint column plus `DEFAULT SNOWFLAKE_ID()`
`SMALLSERIAL`	Smallint column plus `DEFAULT SNOWFLAKE_ID()`
`GENERATED ... AS IDENTITY`	Explicit integer type plus `DEFAULT SNOWFLAKE_ID()`

This normalization happens before the remote CREATE TABLE is sent.

4. Runtime Read and Write Behavior Today

The table above describes the logical schema contract. The next table describes the actual extension implementation status at runtime.

Type family	PostgreSQL -> KalamDB write path	KalamDB -> PostgreSQL read path	Current status
`BOOLEAN`, `SMALLINT`, `INTEGER`, `BIGINT`, `REAL`, `DOUBLE PRECISION`	Dedicated native `Datum -> ScalarValue` branches	Dedicated native Arrow array branches	Best-covered native round-trip surface
`TEXT`, `VARCHAR`	Dedicated text fast path	Dedicated `Utf8` and `LargeUtf8` branches	Best-covered string surface
`CHAR`, `CHARACTER`	Generic PostgreSQL type-output fallback to `ScalarValue::Utf8`	Reads through the normal `Utf8` text path once the backend returns text	Logical mapping exists, but writes are not a dedicated branch
`JSON`, `JSONB`	Dedicated JSON and JSONB normalization to JSON text	Dedicated PostgreSQL type-input path for JSON and JSONB	Best-covered document surface
`FILE`	PostgreSQL sees the `FileRef` JSON representation, usually as `JSONB`; binary upload itself typically happens through KalamLink or another file-aware client	Dedicated JSONB materialization from UTF-8 `FileRef` JSON	Stable local representation, special-case local `JSONB` surface
`BYTEA`	Generic PostgreSQL type-output fallback to `ScalarValue::Utf8`	Dedicated Arrow `Binary -> bytea` branch	Read path is native; write path relies on generic coercion
`DATE`	Generic PostgreSQL type-output fallback to `ScalarValue::Utf8`	Dedicated Arrow `Date32 -> DATE` branch	Read path is native; write path relies on generic coercion
`TIMESTAMP` and generated `TIMESTAMPTZ` surface	Generic PostgreSQL type-output fallback to `ScalarValue::Utf8`	Dedicated Arrow timestamp branches reuse PostgreSQL’s internal timestamp representation	Read path is native; write path relies on generic coercion
`TIME`	Generic PostgreSQL type-output fallback to `ScalarValue::Utf8`	Generic fallback string rendering	Not yet a dedicated PostgreSQL round-trip branch
`UUID`	Generic PostgreSQL type-output fallback to `ScalarValue::Utf8`	No dedicated UUID materialization branch in `arrow_to_pg`	Logical mapping exists; validate against your build before treating UUID as a stable PG round-trip surface
`NUMERIC(p,s)`	Generic PostgreSQL type-output fallback to `ScalarValue::Utf8`	No dedicated numeric materialization branch in `arrow_to_pg`	Logical mapping exists; native PostgreSQL numeric round-trip is not implemented yet
`VECTOR(n)` or `Embedding(n)`	PostgreSQL-side authoring depends on local type availability and current DDL spelling	No dedicated vector materialization branch in `arrow_to_pg`	Reverse mapping exists, but PostgreSQL vector round-trip is not yet a native branch

5. Under the Hood

Write path

For INSERT, UPDATE, and DELETE, the extension does the following:

PostgreSQL exposes a TupleTableSlot to the FDW modify callbacks.
slot_to_row() walks the local columns, skipping _seq and _deleted.
_userid is captured separately and only used as an explicit tenant override for INSERT.
pg_to_kalam::datum_to_scalar() converts each remaining datum into a DataFusion ScalarValue.
Supported primitive OIDs use dedicated branches.
Everything else currently falls back to PostgreSQL’s type output function and becomes ScalarValue::Utf8.
The client serializes mutations as JSON row payloads in the gRPC request.

That is why the write path is broader than the dedicated fast-path branches but still depends on server-side coercion for several PostgreSQL type families.

Read path

For SELECT, the extension does the following:

Builds a ScanRequest with relation identity, table type, tenant context, and simple equality filters.
Sends the request over gRPC.
Receives Arrow IPC batches from KalamDB.
arrow_to_pg::arrow_value_to_datum() materializes supported Arrow types directly into PostgreSQL datums.
JSON and JSONB use PostgreSQL type-input functions so PostgreSQL sees real document datums, not plain text.
Unsupported Arrow types fall back to ScalarValue::to_string().

That final fallback is why logical schema support is currently broader than the stable PostgreSQL materialization surface.

6. What Is Safest Today

If you need the most predictable PostgreSQL behavior today, prefer these surfaces first:

BOOLEAN, SMALLINT, INTEGER, BIGINT, REAL, DOUBLE PRECISION
TEXT and VARCHAR
JSON, JSONB, and FILE as local JSONB
DATE, TIMESTAMP, and the timestamp-like generated TIMESTAMPTZ read path

Use extra care and validate against your target build for:

TIME
UUID
NUMERIC(p,s)
VECTOR(n) or other embedding-heavy PostgreSQL surfaces

The strongest current automated coverage in the extension repo is around JSON, JSONB, and FILE behavior. The rest of the mapping surface is primarily code-backed rather than broad end-to-end type-matrix coverage.

PostgreSQL Extension Data Type Conversions

1. PostgreSQL Authoring to KalamDB Logical Types

2. KalamDB Logical Types Back to PostgreSQL

3. DDL Normalizations Before Remote Create

4. Runtime Read and Write Behavior Today

5. Under the Hood

Write path

Read path

6. What Is Safest Today

Getting Started

Documentation

Resources

Community

PostgreSQL Extension Data Type Conversions

1. PostgreSQL Authoring to KalamDB Logical Types

2. KalamDB Logical Types Back to PostgreSQL

3. DDL Normalizations Before Remote Create

4. Runtime Read and Write Behavior Today

5. Under the Hood

Write path

Read path

6. What Is Safest Today

Related

Getting Started

Documentation

Resources

Community