Migration Guide: v0.3.7 to v0.4.0

This guide covers everything you need to know when upgrading from Stoolap v0.3.7 (snapshot-based storage) to v0.4.0 (volume-based storage). The migration is automatic and requires no manual steps, but understanding the changes will help you operate your database effectively.

Architecture Change Summary

v0.4.0 replaces the snapshot-based persistence engine with an immutable volume-based design inspired by Apache Iceberg and Delta Lake. This is the most significant architectural change in the project’s history.

Aspect	v0.3.7 (Snapshots)	v0.4.0 (Volumes)
Storage model	All rows in hot buffer, periodic snapshot to `.bin`	Hot/cold split: mutable hot buffer + immutable cold `.vol` files
Persistence format	Monolithic row-serialized `.bin` per table	Columnar `.vol` files with zone maps, bloom filters, dictionary encoding, LZ4 compression
Persistence location	`snapshots/<table>/`	`volumes/<table>/`
Metadata	`snapshot_meta.bin`	`manifest.bin` per table (versioned, V6 format)
Startup speed	Proportional to data size (deserialize all rows into memory)	Loads compressed blocks + metadata into RAM, WAL replay for recent writes
Memory usage	All data in memory at all times	Compressed blocks in RAM, columns decompressed from RAM on first access
Query on persisted data	Must be in memory first	Zone map pruning, bloom filters, row-group skipping, lazy column decompression from RAM
Snapshot isolation	Hot buffer only	Full support including cold rows via versioned tombstones
Concurrent writes to persisted rows	N/A (all rows in memory)	Per-table seal fence + row-level claim prevents lost updates
Compaction	Not applicable	Bounded volume compaction: merges sub-target, oversized, and tombstoned volumes into target-sized outputs
DELETE performance	Row removed from memory, snapshot rewrites all	Tombstone (no rewrite until compaction)
Compression	WAL-only (optional)	LZ4 for both WAL and cold volumes (independent `wal_compression` and `volume_compression` flags)

What Happens Automatically

On First Open

When you open a v0.3.7 database with v0.4.0 for the first time, the engine performs a one-time migration:

Detects snapshots/ directory exists and volumes/ does not
Loads all snapshot .bin files into the hot buffer
Replays WAL entries written after the last snapshot
Seals all hot data into immutable .vol files under volumes/<table>/
Creates manifest.bin for each table
Removes the snapshots/ directory entirely

Data Safety

Zero data loss: All rows from snapshots and WAL are preserved
Atomic: If migration fails partway, the original snapshots/ directory is untouched
Idempotent: If interrupted, the next open retries from scratch
Indexes: All indexes (BTree, Hash, Bitmap, HNSW) are rebuilt during recovery
Schema: All column types, constraints, NOT NULL, PRIMARY KEY, UNIQUE survive
All data types: INTEGER, FLOAT, TEXT, BOOLEAN, TIMESTAMP, JSON, VECTOR all migrate correctly

Migration Time

Migration time depends on data size:

Data size	Approximate time
< 1 MB	< 1 second
1-100 MB	1-5 seconds
100 MB - 1 GB	5-30 seconds
> 1 GB	Plan for minutes

The migration is I/O bound. SSD storage significantly reduces migration time.

Configuration Changes

Renamed Parameters

v0.3.7 Parameter	v0.4.0 Parameter	Notes
`snapshot_interval`	`checkpoint_interval`	Backward compatible: old name still accepted in DSN
`snapshot_compression`	`compression`	Backward compatible: old name still accepted. Now sets both `wal_compression` and `volume_compression`
`keep_snapshots`	`keep_snapshots`	Unchanged. Now controls backup snapshot retention

New Parameters

Parameter	Default	Description
`sync_mode`	`normal`	`none` (no fsync, data durable at checkpoint), `normal` (fsync every 1 second), `full` (fsync every write)
`checkpoint_interval`	60	Seconds between checkpoint cycles (seal + compact + WAL truncate)
`compact_threshold`	4	Sub-target volumes per table before merging
`target_volume_rows`	1048576	Target rows per cold volume. Controls compaction split boundary.
`checkpoint_on_close`	on	Seal all hot rows on clean shutdown for fast startup
`wal_compression`	on	LZ4 compression for WAL entries
`volume_compression`	on	LZ4 compression for cold volume files
`compression`	on	Shorthand: sets both `wal_compression` and `volume_compression`

DSN Example

v0.3.7:

file:///data/mydb?sync=normal&snapshot_interval=300&keep_snapshots=5

v0.4.0:

file:///data/mydb?sync_mode=normal&checkpoint_interval=60&compact_threshold=4

The old DSN still works (backward compatible), but the new parameter names are recommended.

Sync Mode Behavior

The sync_mode parameter controls durability guarantees:

Mode	Behavior	Use case
`none`	No fsync calls. Data is durable only after checkpoint writes volumes to disk.	Maximum throughput, acceptable data loss window
`normal`	Fsync every 1 second (batched). DDL operations fsync immediately.	Balanced (comparable to SQLite WAL + synchronous=NORMAL)
`full`	Fsync on every write operation.	Maximum durability, lower throughput

CLI Flag Changes

v0.3.7 Flag	v0.4.0 Flag
`--snapshot-interval`	`--checkpoint-interval`
`--keep-snapshots`	`--keep-snapshots` (unchanged)
N/A	`--compact-threshold` (new)
N/A	`--no-checkpoint-on-close` (new)
N/A	`--snapshot` (new, create backup and exit)
N/A	`--restore [TIMESTAMP]` (new, requires database to open; use with `--reset-volumes` for corruption)
N/A	`--reset-volumes` (new, delete volumes/ for recovery)

PRAGMA Changes

v0.3.7 PRAGMA	v0.4.0 PRAGMA	Notes
`PRAGMA snapshot_interval`	`PRAGMA checkpoint_interval`	Old name still accepted
`PRAGMA SNAPSHOT`	`PRAGMA SNAPSHOT`	Now creates backup `.bin` files (not primary storage)
N/A	`PRAGMA CHECKPOINT`	Runs checkpoint cycle: seal + compact + WAL truncate
N/A	`PRAGMA RESTORE`	Restore from a backup snapshot (latest or by timestamp)
N/A	`PRAGMA compact_threshold`	Set/read compaction threshold
N/A	`PRAGMA target_volume_rows`	Set/read target rows per volume

Important: PRAGMA SNAPSHOT changed meaning. In v0.3.7 it was the primary persistence mechanism. In v0.4.0 it creates optional backup files for disaster recovery. It saves table data as .bin files and index/view definitions as per-timestamp ddl-{timestamp}.bin (with CRC32 integrity check). Primary persistence is handled automatically by the checkpoint cycle. Use PRAGMA RESTORE = '{timestamp}' for consistent point-in-time restore.

-- Create a backup snapshot
PRAGMA SNAPSHOT;

-- Restore from the latest backup
PRAGMA RESTORE;

-- Restore from a specific backup (by timestamp)
PRAGMA RESTORE = '20260315-100000.000';

On-Disk Layout Changes

v0.3.7 Layout

mydb/
  db.lock
  wal/
    wal-0001.bin
    checkpoint.meta
  snapshots/
    snapshot_meta.bin
    users/
      snapshot-20240101-120000.000.bin
    orders/
      snapshot-20240101-120000.000.bin

v0.4.0 Layout (After Migration)

mydb/
  db.lock
  wal/
    wal-0001.bin
    checkpoint.meta
  volumes/
    users/
      manifest.bin
      vol-00001.vol          (columnar, LZ4 compressed)
    orders/
      manifest.bin
      vol-00001.vol
  snapshots/                 (only if PRAGMA SNAPSHOT used, optional)
    snapshot_meta.bin
    ddl-20240315-100000.000.bin        (index + view definitions, per-timestamp)
    manifest-20240315-100000.000.json  (table list for this snapshot batch)
    users/
      snapshot-20240315-100000.000.bin

Key differences:

volumes/ replaces snapshots/ as primary storage
Each table has a manifest.bin tracking its volumes and tombstones
.vol files are columnar (not row-serialized .bin), with per-column zone maps, bloom filters, and dictionary encoding
.vol files use V4 format (STV4 magic) with per-column per-row-group LZ4 blocks
Row-group zone maps (64K-row groups) enable sub-volume pruning
snapshots/ only appears if you explicitly run PRAGMA SNAPSHOT for backups

Behavioral Changes

Checkpoint Cycle (New)

v0.3.7 had a snapshot cycle that periodically serialized the entire hot buffer into .bin files. v0.4.0 replaces this with a checkpoint cycle:

Seal: Move hot buffer rows into a new immutable .vol file (per-table seal fence ensures DML consistency)
Persist manifests: Write manifest.bin atomically (fsync tmp, rename, fsync directory)
WAL Truncate: Remove WAL entries before the checkpoint LSN
Compact (background thread): Merge sub-target, oversized, and tombstoned volumes into target-sized outputs

The checkpoint cycle runs automatically every checkpoint_interval seconds (default: 60) and on clean shutdown.

Seal Thresholds

Not every checkpoint creates a new volume. Rows are only sealed when:

First seal: 100,000+ rows in hot buffer
Incremental seals: 10,000+ new rows since last seal
On close: ALL remaining hot rows are sealed (regardless of count)

This prevents creating many tiny volume files.

Seal Fence (DML Safety)

v0.4.0 uses a per-table RwLock (seal fence) to coordinate between DML operations and the seal process:

DML (INSERT, UPDATE, DELETE) acquires a shared read lock
Seal acquires an exclusive write lock

This ensures no DML can see a partially-sealed state. Additionally, at commit time, a generation counter detects whether a seal happened during the transaction’s lifetime. If so, pending rows are revalidated against cold segments to catch constraint violations that the original INSERT-time check may have missed.

Bounded Volume Compaction

v0.4.0 uses content-driven compaction with a target volume size (target_volume_rows, default 1M rows):

Sub-target volumes (smaller than target): merged together when count exceeds compact_threshold
Oversized volumes (larger than 150% of target): split into target-sized volumes
Dirty volumes (at-target but have tombstoned rows): rewritten to physically remove dead rows
Clean at-target volumes: frozen, never rewritten

Output is split at row-group aligned boundaries (multiples of 64K rows). This keeps volumes balanced and minimizes disk I/O.

Snapshot Isolation on Cold Data

v0.3.7 kept all data in the hot buffer, so snapshot isolation only applied to in-memory rows. v0.4.0 extends snapshot isolation to cold (volume-backed) rows using versioned tombstones:

BEGIN ISOLATION LEVEL SNAPSHOT;
-- This transaction sees a frozen point-in-time view,
-- even for rows stored in cold volumes.
-- Concurrent DELETEs and UPDATEs are invisible to this snapshot.
COMMIT;

Each tombstone stores a commit_seq. A snapshot transaction at begin_seq=N only sees tombstones with commit_seq <= N. This means concurrent modifications to cold rows are invisible to running snapshot transactions.

Cold-Row Write Conflict Detection

v0.3.7 had no cold storage, so all writes targeted the hot buffer which already had MVCC conflict detection. v0.4.0 introduces cold rows that live in volumes. When two transactions concurrently modify the same cold row, try_claim_row() uses the hot buffer’s uncommitted_writes map to prevent lost updates:

Txn A: UPDATE WHERE id=5 (cold row) -> claims row 5
Txn B: UPDATE WHERE id=5 (cold row) -> conflict error (row already claimed)

Only one transaction can modify a given cold row at a time. The loser receives a write conflict error and must retry.

Query Performance During Checkpoint

v0.3.7 snapshot creation serialized the entire hot buffer under a lock, which could block reads. v0.4.0’s checkpoint cycle (seal + compact) is designed to minimize query impact:

COUNT(*) uses O(1) formula during seal overlap (no scanning needed)
Aggregation pushdowns (SUM, MIN, MAX) use pre-computed cold volume statistics
Cold data scanning is lazy via MergingScanner (streams rows from volumes, columns decompressed from RAM on first access)
Column pruning: only columns referenced by filters and projections are decompressed from compressed in-memory blocks

Aggregation Pushdown

v0.4.0 pushes COUNT(*), SUM(), MIN(), MAX() directly to cold volume statistics when possible. This makes aggregations on large cold tables near-instant. The optimization is disabled for snapshot isolation transactions (which need tombstone filtering).

Pre-Migration Checklist

Before upgrading:

Backup your database directory including snapshots/ and wal/
Ensure clean shutdown of v0.3.7 (wait for any in-progress snapshots to complete)
Check disk space: Migration temporarily needs ~2x the snapshot data size (old snapshots + new volumes exist briefly during migration)
Plan for downtime: The database is unavailable during migration (proportional to data size)
Test first: Copy your database to a test location and open with v0.4.0 to verify

Post-Migration Verification

After migration, verify your data:

-- Check all tables exist
SHOW TABLES;

-- Verify row counts
SELECT COUNT(*) FROM your_table;

-- Spot-check values
SELECT * FROM your_table WHERE id = 1;

-- Verify indexes work
SELECT * FROM your_table WHERE indexed_column = 'value';

-- Test writes
INSERT INTO your_table VALUES (...);
UPDATE your_table SET col = 'new' WHERE id = 1;
DELETE FROM your_table WHERE id = 2;

Rollback Plan

If you need to revert to v0.3.7:

Before migration: Simply use the v0.3.7 binary with your original database
After migration: The snapshots/ directory has been removed. You must restore from your pre-migration backup

There is no automatic downgrade path from v0.4.0 to v0.3.7. Always backup before upgrading.

Disaster Recovery

If your v0.4.0 database becomes corrupted (e.g., disk failure, interrupted checkpoint), use the CLI restore:

# Restore from backup snapshot (requires database to open)
stoolap -d "file:///path/to/db" --restore

# Recovery from corrupted volumes/manifests (cleans up first)
stoolap -d "file:///path/to/db" --reset-volumes --restore

The --restore flag opens the database engine and replaces current data with snapshot data. If volumes or manifests are corrupted and the database cannot open, use --reset-volumes --restore which removes volumes/ and wal/ first.

To ensure you always have backup snapshots available:

# Create regular backups
stoolap -d "file:///path/to/db" --snapshot

# Or from SQL
PRAGMA SNAPSHOT;

Driver Compatibility

All official drivers (Go, Python, Node.js, WASM) are compatible with v0.4.0. The migration is transparent to drivers since it happens at the storage layer. No driver code changes are needed.

If you use the C FFI (include/stoolap.h), the API is unchanged. The header file has been updated with new PRAGMA documentation but no function signature changes.

Troubleshooting

Migration appears stuck

For very large databases (>1GB), migration can take minutes. Check stderr for progress messages. If no output appears for >5 minutes, the process may be hung. Kill it safely and retry. The migration is idempotent.

“db.lock” prevents opening

If a previous process crashed during migration, the lock file may be stale. On macOS/Linux, the flock is released when the process exits. Simply retry opening. If the file persists, it is safe to delete db.lock when no other process is accessing the database.

Missing data after migration

If row counts differ, check:

Were there uncommitted transactions when v0.3.7 shut down? Only committed data is migrated.
Was the WAL intact? WAL corruption may cause loss of post-snapshot data.
Did you shut down v0.3.7 cleanly? A dirty shutdown with WAL corruption can lose recent writes.

Performance regression after migration

If queries are slower after migration:

Run PRAGMA CHECKPOINT to ensure all data is in cold volumes (not hot buffer)
Check that indexes are created. Migration preserves index definitions but rebuilds them. Large tables may need time for index population.
v0.4.0 uses lazy scanning. Some query patterns may behave differently. File an issue if you observe consistent regressions.