Running KalamDB in Production
This guide walks you through every setting, secret, and network boundary you need to lock down before exposing a KalamDB server to the public internet or an untrusted network.
Work through the checklist top-to-bottom. Each section is independently actionable — you can turn a single item on and restart the server between steps.
Threat model assumption: attackers can reach your HTTP and cluster RPC ports. Your defenses must not rely on network isolation alone.
Quick Checklist
Copy this into your deploy runbook. Every box must be ticked before you open the port.
Network & TLS
-
server.host = "127.0.0.1"(or explicitly a private interface); never0.0.0.0unless you need remote access - Terminate TLS at an edge proxy (nginx, Caddy, ALB, Cloudflare) — KalamDB itself serves plain HTTP
- HTTP server sits behind a firewall/security-group that only allows the edge proxy
- Cluster RPC port (
cluster.rpc_addr) is not reachable from the public internet - For multi-node clusters:
rpc_tls.enabled = truewithrequire_client_cert = true
Secrets
-
auth.jwt_secretis at least 32 bytes, random, unique per environment - JWT secret is injected via env var or secret manager, never committed to source
- Root/admin password is changed from any default and stored in a secret manager
-
KALAMDB_ROOT_PASSWORDenv var is cleared from the shell history after seeding - OAuth/OIDC client secrets are in env vars, not the config file
Auth & RBAC
-
auth.allow_remote_setup = falseafter first-run bootstrap - Every human user has a named account, not the built-in
root - Service accounts use the
servicerole, neverdba/system - Account lockout is enabled (
auth.max_failed_attempts,auth.lockout_duration_minutes) - Access token expiry ≤ 1 hour; refresh token expiry ≤ 7 days
Origins & Cookies
-
security.cors.allowed_originsis an explicit allowlist — no"*" -
security.cors.allow_credentials = trueonly when origins are an explicit allowlist -
security.strict_ws_origin_check = true - Auth cookies served over HTTPS:
auth.cookie_secure = true,SameSite=Strict,HttpOnly
Abuse Controls
-
rate_limit.enable_connection_protection = true -
rate_limit.max_auth_requests_per_ip_per_sectuned (start at 10–20) -
rate_limit.max_connections_per_iptuned (start at 100) -
security.max_request_body_sizetuned to your real upload size -
security.max_ws_message_sizetuned to your real message size
Observability
- Audit logs shipped to a separate write-only sink (e.g., SIEM, S3 with object lock)
- Metrics endpoint exposed only to the internal monitoring network
- Alerts configured on: failed logins/min, 401/403 rate, new user creation, role changes
1. Bind Addresses & TLS
KalamDB does not terminate TLS itself. Run it behind a reverse proxy that provides HTTPS and forwards to KalamDB on localhost.
Recommended topology
[ client ] ──HTTPS──► [ nginx/Caddy/ALB ] ──HTTP(loopback)──► [ kalamdb :8080 ]
[ kalamdb :9090 cluster mTLS ]Config
[server]
host = "127.0.0.1" # loopback only; proxy handles remote clients
port = 8080
[cluster]
# Bind cluster RPC to the private cluster network interface only.
rpc_addr = "10.0.1.15:9090"
[rpc_tls]
enabled = true
require_client_cert = true
ca_cert = "/etc/kalamdb/tls/cluster-ca.pem"
server_cert = "/etc/kalamdb/tls/node.pem"
server_key = "/etc/kalamdb/tls/node.key"If your cluster spans more than loopback nodes, KalamDB will refuse to start without rpc_tls.enabled = true. Do not work around this check.
2. JWT Secrets & Token Hygiene
Secret requirements
- Length: minimum 32 bytes (64+ recommended)
- Source: cryptographic RNG (
openssl rand -base64 48), not a passphrase - Rotation: rotate on compromise, personnel change, or at least annually
- Scope: unique per environment (dev, staging, prod never share)
Generate and inject via environment:
export KALAMDB_AUTH_JWT_SECRET="$(openssl rand -base64 48)"[auth]
# Leave unset in the file when you set KALAMDB_AUTH_JWT_SECRET in the env.
# jwt_secret = "..."
access_token_expiry_hours = 1
refresh_token_expiry_hours = 168 # 7 daysKalamDB refuses to start on non-loopback binds if the secret is empty, a known-weak placeholder, or shorter than 32 bytes.
Rotation procedure
- Generate a new secret.
- Deploy it to every node simultaneously.
- Bounce the service.
- All outstanding tokens will fail verification — clients must re-login.
Do not re-use an old secret. There is no “grace period” key ring; rotation is abrupt by design.
Where tokens are exposed
| Surface | Carries token? | Notes |
|---|---|---|
Authorization: Bearer … header | Yes | Preferred for server-to-server |
| Auth cookies | Yes | Browser flows; always set Secure, HttpOnly, SameSite=Strict |
| WebSocket subprotocol | Yes | Sent once during upgrade; never in URL query |
| Logs | No | Log redaction is enabled by default; keep it enabled |
3. Admin Bootstrap & Setup
On first boot, KalamDB seeds a root user. What happens next depends on your configuration.
Option A — Seeded password (recommended for remote deployments)
export KALAMDB_ROOT_PASSWORD="$(openssl rand -base64 24)"
# run the server once so the password is hashed and persisted
# then unset the env var and store the password in your secret manager
unset KALAMDB_ROOT_PASSWORDOption B — Remote setup endpoint (one-time)
If you cannot set the env var, allow the setup endpoint for the first call only:
[auth]
allow_remote_setup = true # flip back to false immediately after setupSet allow_remote_setup = false and restart after you complete setup. Leaving it enabled lets anyone who can reach the endpoint attempt to seed credentials.
Option C — Localhost-only root
If you only administer via an SSH tunnel, leave the password empty. KalamDB will refuse remote logins for root and only accept localhost calls. Create named admin users via CREATE USER for remote access.
4. Password Policy
Minimum controls enforced by KalamDB:
- Minimum length (configurable via
auth.password_min_length) - Max length 72 bytes (bcrypt limit)
- Bcrypt hash with cost 12
- Rejected common-password list
- Constant-time verification
- Generic error messages (no user enumeration)
- Account lockout after N failed attempts
Recommended runtime:
[auth]
password_min_length = 12
max_failed_attempts = 5
lockout_duration_minutes = 15Push complexity requirements (e.g., character classes) up to your identity provider — KalamDB deliberately does not reject non-common weak passwords beyond length to avoid false rejections of passphrases.
5. CORS & WebSocket Origins
CORS for the browser admin UI
[security.cors]
allowed_origins = [
"https://admin.example.com",
"https://app.example.com",
]
allow_credentials = true
allowed_methods = ["GET", "POST", "OPTIONS"]
allowed_headers = ["Authorization", "Content-Type"]
max_age = 600Hard rules:
- Never combine
allowed_origins = ["*"]withallow_credentials = true. Browsers reject it, and KalamDB will refuse to start with that combination. - Use scheme + host + port — origin matching is exact.
- Never include localhost origins in production config.
WebSocket origins
[security]
strict_ws_origin_check = true
allowed_ws_origins = ["https://app.example.com"]strict_ws_origin_check = true rejects connections that omit the Origin header (non-browser tooling must set it explicitly). Leave empty allowed_ws_origins only on loopback dev installs.
6. Rate Limiting & Connection Protection
Even with auth in place, rate limits protect against credential stuffing, SQL abuse, and connection exhaustion.
[rate_limit]
enable_connection_protection = true
# Auth endpoints (login, refresh, setup, WS auth)
max_auth_requests_per_ip_per_sec = 10
# Per-IP concurrent connection cap
max_connections_per_ip = 100
# Pre-auth request flood protection
max_requests_per_ip_per_sec = 200
# Per-user query rate (SQL endpoint)
max_queries_per_sec = 100
# WebSocket message flood protection
max_messages_per_sec = 50
# Temporary ban window for abusive IPs
ban_duration_seconds = 300If you run behind a reverse proxy, configure security.trusted_proxy_ranges so that rate limits attribute requests to the real client IP via X-Forwarded-For. Do not set this to 0.0.0.0/0 — that lets any caller spoof any IP and bypass the limits.
[security]
trusted_proxy_ranges = ["10.0.0.0/8", "172.16.0.0/12"]7. RBAC & Least Privilege
KalamDB has four roles. Use the lowest one that works.
| Role | Typical use | Can do |
|---|---|---|
user | App end-users, SDK clients | Read/write their own namespaces, run DML |
service | Backend services, pub/sub consumers | DML + topic consume/ack |
dba | Database administrators | DDL, manage users except system |
system | Reserved for the server itself | Everything (do not create new ones) |
Rules of thumb:
- Application tokens should be
userorservice, neverdba. - Create one
dbaper human administrator; audit their usage. EXECUTE AS USERfollows a strict hierarchy (System → DBA → Service → User). A role cannot impersonate its peers or upward.- System tables (
system.*) are read/write restricted todbaandsystem. This is enforced inside the query planner, including through subqueries, CTEs, and views.
On role demotion
When you demote or lock a user, rotate their tokens. KalamDB re-validates the DB role on each token refresh and invalidates tokens whose token_generation is older than the DB row — but already-issued access tokens remain valid until access_token_expiry_hours. Keep access-token TTLs short.
8. Request Size Guards & File Uploads
[security]
max_request_body_size = 10485760 # 10 MiB; tune to real payload size
max_ws_message_size = 1048576 # 1 MiBFor file-accepting endpoints (multipart FILE("name") placeholders, exports), KalamDB:
- Validates path components against an allowlist (alphanumeric,
-,_,.) - Canonicalises paths and rejects symlink escape
- Returns 403 to non-owners trying to download another user’s files or exports
Operators should additionally:
- Put an object-storage lifecycle on the exports bucket (e.g., delete after N days)
- Scan uploaded files with your AV/virus pipeline before re-serving them to users
9. Cluster RPC Hardening
For anything beyond a single-node loopback cluster:
[rpc_tls]
enabled = true
require_client_cert = true
ca_cert = "/etc/kalamdb/tls/cluster-ca.pem"
server_cert = "/etc/kalamdb/tls/node-1.pem"
server_key = "/etc/kalamdb/tls/node-1.key"
[cluster]
rpc_addr = "10.0.1.15:9090" # private interface onlyNotes:
- The cluster RPC port carries Raft consensus,
ForwardSql,Ping, and node-info RPCs. Treat it as equally sensitive to the data port. ForwardSqladditionally re-validates the caller’s Bearer JWT on the receiving node — mTLS is defense-in-depth, not the only layer.- Use a private CA unique to the cluster. Do not reuse your edge-TLS certificate chain for cluster mTLS.
- Rotate node certificates independently of JWT secrets.
10. Logging, Audit & Observability
- KalamDB emits structured JSON logs. SQL statements are redacted before logging; do not re-enable raw SQL in log formatters.
- Enable the audit stream to capture user logins, user creation/modification, role changes, impersonation events, and admin export downloads.
- Ship audit logs to an append-only sink (S3 with object lock, SIEM).
- Expose Prometheus metrics on the internal network only — never on the public API port.
Suggested alerts
auth_failed_logins_per_minute > 20auth_403_forbidden_per_minute > 50user_role_change_events > 0(human review on every change)impersonation_eventsgrouped by actor- Cluster RPC connection count deviation > 3σ
11. Incident Response Playbook
When credentials, a token secret, or a node are suspected compromised:
- Contain — rotate
auth.jwt_secret. All outstanding tokens become invalid immediately. - Lock down — set
auth.allow_remote_setup = false, revoke compromised user accounts (ALTER USER … LOCK). - Rotate — regenerate cluster mTLS material, redeploy, force clients to re-authenticate.
- Narrow — temporarily lower
max_auth_requests_per_ip_per_secandmax_connections_per_ip. - Preserve — snapshot logs, audit stream, and RocksDB backup before further changes.
- Review — audit
system.usersfor unexpected role changes,system.jobsfor unexpected admin jobs, exports directory for unexpected downloads. - Post-mortem — document the breach path, fix the configuration gap, add a regression check.
12. High-Risk Misconfigurations To Avoid
KalamDB will refuse to start on several of these. The rest are human errors we see most often.
server.host = "0.0.0.0"with no edge proxy and default JWT secretsecurity.cors.allowed_origins = ["*"]combined withallow_credentials = truesecurity.strict_ws_origin_check = falseon a public WebSocket endpointauth.allow_remote_setup = trueleft on after bootstrapauth.jwt_secretshared across environments, or shorter than 32 bytessecurity.trusted_proxy_rangescontaining0.0.0.0/0or::/0- Running a multi-node cluster with
rpc_tls.enabled = false - Handing out
dbaorsystemrole tokens to applications - Storing secrets in
server.tomlcommitted to Git
Baseline Production server.toml
Drop-in starting point. Fill in real values, inject secrets via env.
[server]
host = "127.0.0.1"
port = 8080
enable_http2 = true
[auth]
# jwt_secret intentionally omitted — supplied via KALAMDB_AUTH_JWT_SECRET
allow_remote_setup = false
cookie_secure = true
access_token_expiry_hours = 1
refresh_token_expiry_hours = 168
password_min_length = 12
max_failed_attempts = 5
lockout_duration_minutes = 15
[rate_limit]
enable_connection_protection = true
max_auth_requests_per_ip_per_sec = 10
max_connections_per_ip = 100
max_requests_per_ip_per_sec = 200
max_queries_per_sec = 100
max_messages_per_sec = 50
ban_duration_seconds = 300
[security]
max_request_body_size = 10485760
max_ws_message_size = 1048576
strict_ws_origin_check = true
allowed_ws_origins = ["https://app.example.com"]
trusted_proxy_ranges = ["10.0.0.0/8"]
[security.cors]
allowed_origins = ["https://admin.example.com", "https://app.example.com"]
allow_credentials = true
allowed_methods = ["GET", "POST", "OPTIONS"]
allowed_headers = ["Authorization", "Content-Type"]
max_age = 600
[rpc_tls]
enabled = true
require_client_cert = true
ca_cert = "/etc/kalamdb/tls/cluster-ca.pem"
server_cert = "/etc/kalamdb/tls/node.pem"
server_key = "/etc/kalamdb/tls/node.key"
[cluster]
rpc_addr = "10.0.1.15:9090"