PostgreSQL Extension Data Type Conversions
This page documents the current conversion behavior in pg_kalam.
There are three distinct layers to keep straight:
- PostgreSQL-side DDL authoring and logical schema mapping.
- PostgreSQL datum conversion on writes.
- Arrow batch materialization on reads.
The important constraint is that the canonical KalamDB logical type system is broader than the extension’s dedicated PostgreSQL runtime branches. The logical mapping exists for more types than the extension can currently materialize natively on the PostgreSQL side.
If this page and the code diverge, the code wins.
1. PostgreSQL Authoring to KalamDB Logical Types
These are the common PostgreSQL-facing type names that map into KalamDB’s logical schema model today.
| PostgreSQL-side authoring type | KalamDB logical type | Arrow or storage representation | Notes |
|---|---|---|---|
BOOLEAN, BOOL | Boolean | Boolean | Dedicated native read and write branches exist |
SMALLINT, INT2 | SmallInt | Int16 | Dedicated native read and write branches exist |
INTEGER, INT, INT4, MEDIUMINT | Int | Int32 | Dedicated native read and write branches exist |
BIGINT, INT8 | BigInt | Int64 | Dedicated native read and write branches exist |
REAL, FLOAT, FLOAT4 | Float | Float32 | Dedicated native read and write branches exist |
DOUBLE PRECISION, DOUBLE, FLOAT8, FLOAT64 | Double | Float64 | Dedicated native read and write branches exist |
TEXT, VARCHAR, CHAR, STRING, NVARCHAR | Text | Utf8 | TEXT and VARCHAR have dedicated write fast paths; CHAR currently uses the generic text-output fallback on writes |
BYTEA, BYTES, BINARY, VARBINARY, BLOB | Bytes | Binary | Read path has a dedicated BYTEA materializer; write path currently falls back to PostgreSQL text output |
JSON, JSONB | Json | Utf8 containing JSON text | JSON and JSONB are first-class in the extension on both read and write |
DATE | Date | Date32 | Read path has a dedicated DATE materializer; write path currently falls back to PostgreSQL text output |
TIME | Time | Time64(Microsecond) | Logical mapping exists, but read and write still rely on generic conversion fallback |
TIMESTAMP | Timestamp | Timestamp(Microsecond, None) | Read path has a dedicated timestamp materializer; write path currently falls back to PostgreSQL text output |
NUMERIC(p,s), DECIMAL(p,s) | Decimal { precision, scale } | Decimal128(p, s) | Logical mapping exists; PostgreSQL numeric materialization is not yet a dedicated branch |
UUID | Uuid | FixedSizeBinary(16) | Logical mapping exists; PostgreSQL UUID materialization is not yet a dedicated branch |
FILE | File | Utf8 containing serialized FileRef JSON | The remote KalamDB column is FILE; the local PostgreSQL column is rewritten to JSONB on the USING kalamdb path |
Two special cases need separate callouts:
KalamDataType::DateTimeis the timezone-aware logical type and stores asTimestamp(Microsecond, Some("UTC")). Its reverse PostgreSQL type name isTIMESTAMPTZ, but PostgreSQL-side timezone-aware authoring spellings should be validated against your exact build.KalamDataType::Embedding(n)is the logical embedding type. Its reverse PostgreSQL type name isVECTOR(n), but the KalamDB logical DDL spelling is stillEMBEDDING(n), so PostgreSQL-side authoring should be validated carefully instead of assuming full shorthand parity.
2. KalamDB Logical Types Back to PostgreSQL
When the extension needs to generate or mirror PostgreSQL column type names from a KalamDB logical schema, it uses the following canonical mapping.
| KalamDB logical type | Generated or mirrored PostgreSQL type name |
|---|---|
Boolean | BOOLEAN |
SmallInt | SMALLINT |
Int | INTEGER |
BigInt | BIGINT |
Float | REAL |
Double | DOUBLE PRECISION |
Text | TEXT |
Bytes | BYTEA |
Date | DATE |
Time | TIME |
Timestamp | TIMESTAMP |
DateTime | TIMESTAMPTZ |
Uuid | UUID |
Json | JSONB |
File | JSONB |
Decimal { precision, scale } | NUMERIC(p, s) |
Embedding(n) | VECTOR(n) |
Current local-surface rules modify that canonical reverse mapping in a few places:
- On
CREATE TABLE ... USING kalamdb, most local PostgreSQL columns keep the type you declared. JSONstaysJSONlocally if you declaredJSON.JSONBstaysJSONBlocally if you declaredJSONB.FILEis the special case: it is always rewritten locally toJSONB.
If you mirror a remote FILE column manually with CREATE FOREIGN TABLE, declare the local PostgreSQL column as JSONB.
3. DDL Normalizations Before Remote Create
The DDL hook performs a small set of PostgreSQL-specific normalizations before sending the user-column definition to KalamDB.
| PostgreSQL authoring form | Remote KalamDB DDL result |
|---|---|
SERIAL | Integer column plus DEFAULT SNOWFLAKE_ID() |
BIGSERIAL | Bigint column plus DEFAULT SNOWFLAKE_ID() |
SMALLSERIAL | Smallint column plus DEFAULT SNOWFLAKE_ID() |
GENERATED ... AS IDENTITY | Explicit integer type plus DEFAULT SNOWFLAKE_ID() |
This normalization happens before the remote CREATE TABLE is sent.
4. Runtime Read and Write Behavior Today
The table above describes the logical schema contract. The next table describes the actual extension implementation status at runtime.
| Type family | PostgreSQL -> KalamDB write path | KalamDB -> PostgreSQL read path | Current status |
|---|---|---|---|
BOOLEAN, SMALLINT, INTEGER, BIGINT, REAL, DOUBLE PRECISION | Dedicated native Datum -> ScalarValue branches | Dedicated native Arrow array branches | Best-covered native round-trip surface |
TEXT, VARCHAR | Dedicated text fast path | Dedicated Utf8 and LargeUtf8 branches | Best-covered string surface |
CHAR, CHARACTER | Generic PostgreSQL type-output fallback to ScalarValue::Utf8 | Reads through the normal Utf8 text path once the backend returns text | Logical mapping exists, but writes are not a dedicated branch |
JSON, JSONB | Dedicated JSON and JSONB normalization to JSON text | Dedicated PostgreSQL type-input path for JSON and JSONB | Best-covered document surface |
FILE | PostgreSQL sees the FileRef JSON representation, usually as JSONB; binary upload itself typically happens through KalamLink or another file-aware client | Dedicated JSONB materialization from UTF-8 FileRef JSON | Stable local representation, special-case local JSONB surface |
BYTEA | Generic PostgreSQL type-output fallback to ScalarValue::Utf8 | Dedicated Arrow Binary -> bytea branch | Read path is native; write path relies on generic coercion |
DATE | Generic PostgreSQL type-output fallback to ScalarValue::Utf8 | Dedicated Arrow Date32 -> DATE branch | Read path is native; write path relies on generic coercion |
TIMESTAMP and generated TIMESTAMPTZ surface | Generic PostgreSQL type-output fallback to ScalarValue::Utf8 | Dedicated Arrow timestamp branches reuse PostgreSQL’s internal timestamp representation | Read path is native; write path relies on generic coercion |
TIME | Generic PostgreSQL type-output fallback to ScalarValue::Utf8 | Generic fallback string rendering | Not yet a dedicated PostgreSQL round-trip branch |
UUID | Generic PostgreSQL type-output fallback to ScalarValue::Utf8 | No dedicated UUID materialization branch in arrow_to_pg | Logical mapping exists; validate against your build before treating UUID as a stable PG round-trip surface |
NUMERIC(p,s) | Generic PostgreSQL type-output fallback to ScalarValue::Utf8 | No dedicated numeric materialization branch in arrow_to_pg | Logical mapping exists; native PostgreSQL numeric round-trip is not implemented yet |
VECTOR(n) or Embedding(n) | PostgreSQL-side authoring depends on local type availability and current DDL spelling | No dedicated vector materialization branch in arrow_to_pg | Reverse mapping exists, but PostgreSQL vector round-trip is not yet a native branch |
5. Under the Hood
Write path
For INSERT, UPDATE, and DELETE, the extension does the following:
- PostgreSQL exposes a
TupleTableSlotto the FDW modify callbacks. slot_to_row()walks the local columns, skipping_seqand_deleted._useridis captured separately and only used as an explicit tenant override forINSERT.pg_to_kalam::datum_to_scalar()converts each remaining datum into a DataFusionScalarValue.- Supported primitive OIDs use dedicated branches.
- Everything else currently falls back to PostgreSQL’s type output function and becomes
ScalarValue::Utf8. - The client serializes mutations as JSON row payloads in the gRPC request.
That is why the write path is broader than the dedicated fast-path branches but still depends on server-side coercion for several PostgreSQL type families.
Read path
For SELECT, the extension does the following:
- Builds a
ScanRequestwith relation identity, table type, tenant context, and simple equality filters. - Sends the request over gRPC.
- Receives Arrow IPC batches from KalamDB.
arrow_to_pg::arrow_value_to_datum()materializes supported Arrow types directly into PostgreSQL datums.- JSON and JSONB use PostgreSQL type-input functions so PostgreSQL sees real document datums, not plain text.
- Unsupported Arrow types fall back to
ScalarValue::to_string().
That final fallback is why logical schema support is currently broader than the stable PostgreSQL materialization surface.
6. What Is Safest Today
If you need the most predictable PostgreSQL behavior today, prefer these surfaces first:
BOOLEAN,SMALLINT,INTEGER,BIGINT,REAL,DOUBLE PRECISIONTEXTandVARCHARJSON,JSONB, andFILEas localJSONBDATE,TIMESTAMP, and the timestamp-like generatedTIMESTAMPTZread path
Use extra care and validate against your target build for:
TIMEUUIDNUMERIC(p,s)VECTOR(n)or other embedding-heavy PostgreSQL surfaces
The strongest current automated coverage in the extension repo is around JSON, JSONB, and FILE behavior. The rest of the mapping surface is primarily code-backed rather than broad end-to-end type-matrix coverage.