Skip to Content
SQL ReferenceBackup & Restore

Backup & Restore

KalamDB provides built-in SQL commands for backing up and restoring the entire database. A backup captures everything in one compressed archive — RocksDB data, Parquet storage files, Raft snapshots, and the server configuration — so you always have a consistent, self-contained snapshot you can restore from.

Role requirement: BACKUP DATABASE and RESTORE DATABASE require the DBA or System role.


BACKUP DATABASE

Create a full database backup. The entire data directory and server.toml are compressed into a single .tar.gz archive at the specified path.

BACKUP DATABASE TO '<backup_path>';

The command enqueues a background backup job and returns immediately with a Job ID you can use to monitor progress.

Parameters

ParameterDescription
backup_pathAbsolute path for the output archive. Must end in .tar.gz. Quotes (single or double) are required.

Archive contents

Path inside archiveDescription
data/rocksdb/RocksDB write-path data
data/storage/Flushed Parquet segment files
data/snapshots/Raft snapshots
data/streams/Stream commit log data
server.tomlServer configuration (if present)

Example

-- Back up the entire database to a timestamped archive BACKUP DATABASE TO '/backups/kalamdb_20260224.tar.gz'; -- Store to a date-named path BACKUP DATABASE TO '/var/backups/kalamdb/2026-02-24.tar.gz';

Response

Database backup started to '/backups/kalamdb_20260224.tar.gz'. Job ID: BK-0001

RESTORE DATABASE

Restore the entire database from a previously created .tar.gz backup archive. The command stages the restored data under <db_path>_restore_pending/ without touching the live database. A server restart is required to complete the restore — on startup, KalamDB detects the pending directory and swaps it in.

RESTORE DATABASE FROM '<backup_path>';

Warning: Once the server restarts after staging, all data written since the backup was taken will be lost. Take a fresh backup of the current state before running a restore if you need a rollback option.

Parameters

ParameterDescription
backup_pathAbsolute path to the .tar.gz backup archive to restore from. Quotes (single or double) are required.

Restore lifecycle

  1. Issue RESTORE DATABASE FROM '<path>'; — returns a Job ID.
  2. The job extracts the archive to <db_path>_restore_pending/.
  3. Job status transitions to Completed with the message “Restore staged from ’…’. RocksDB restore is pending server restart.”
  4. Restart the KalamDB server — it detects the pending directory, swaps it in, and starts fresh.

Example

-- Stage a restore from a specific timestamped backup RESTORE DATABASE FROM '/backups/kalamdb_20260224.tar.gz';

Response

Database restore started from '/backups/kalamdb_20260224.tar.gz'. Job ID: RS-0001

After the job completes:

-- Confirm the restore job reached Completed before restarting SELECT job_id, status, message FROM system.jobs WHERE job_id = 'RS-0001';

Then restart the server process to activate the restored data.


Path security rules

KalamDB validates the backup path on both BACKUP and RESTORE to prevent path traversal and sensitive file access:

  • .. sequences are not allowed (blocks path traversal)
  • Null bytes (\0) are not allowed
  • Writing to /etc/, /root/, /var/log/, or C:\Windows\ is blocked
  • The path must be quoted (bare paths are rejected)
-- These will be rejected: BACKUP DATABASE TO '../../../tmp/evil.tar.gz'; -- path traversal BACKUP DATABASE TO '/etc/shadow'; -- sensitive directory BACKUP DATABASE TO /backups/app.tar.gz; -- unquoted path

Async job execution

Backup and restore operations run as background jobs managed by the UnifiedJobManager. The SQL command returns immediately with a job ID; the work happens asynchronously so the server remains responsive.

You can monitor job status via the system.jobs table:

SELECT job_id, status, message FROM system.jobs WHERE job_id = 'BK-0001';
status valueMeaning
QueuedJob is waiting to start
RunningBackup/restore is in progress
CompletedFinished successfully
FailedAn error occurred — see message for details

Backup strategy

For production deployments, consider:

  1. Scheduled backups — Run BACKUP DATABASE on a cron or application-level schedule (daily, hourly, etc.)
  2. Timestamped archives — Include a date/time in the path so each backup is uniquely named and old ones are not silently overwritten
  3. Remote storage — Copy the .tar.gz archive to S3, GCS, or Azure Blob Storage after creation for off-site durability
  4. Snapshot + backup — Combine CLUSTER SNAPSHOT with BACKUP DATABASE for a fully consistent cluster-wide state capture
  5. Pre-restore snapshot — Take a backup of the current state before running RESTORE DATABASE so you can roll back if needed
Last updated on