Parallel Execution in Stoolap

Overview

Stoolap’s query execution engine is designed to accelerate SQL query processing by operating on data in parallel batches, allowing for better CPU utilization across multiple cores. This document explains the architecture and components of the parallel execution system.

Key Concepts

Batch Processing

The parallel execution model processes data in batches to improve throughput. Instead of processing one row at a time, the engine processes multiple rows simultaneously using Rayon’s work-stealing scheduler.

Key characteristics:

Automatic parallelization based on data size
Work-stealing for optimal load balancing
Configurable thresholds for parallel execution

Parallelization Thresholds

Stoolap automatically parallelizes operations based on data size:

Operation	Threshold	Description
Filter (WHERE)	10,000 rows	Parallel predicate evaluation
Hash Join	10,000 rows	Parallel hash build and probe
ORDER BY	50,000 rows	Parallel sorting
DISTINCT	10,000 rows	Two-phase parallel deduplication
DISTINCT ON	N/A	Hash-based O(groups) dedup after sort (not parallelized)

Architecture Components

1. Parallel Filter

For large tables, WHERE clause evaluation is parallelized:

Data is divided into chunks
Each chunk is processed by a separate thread
Results are merged using efficient concurrent data structures

2. Parallel Hash Join

Hash joins are parallelized in two phases:

Build Phase:

The build side is partitioned across threads
Hash table is constructed using DashMap (concurrent hash map)

Probe Phase:

The probe side is processed in parallel
Each thread looks up matches in the shared hash table

3. Parallel Sort

Large ORDER BY operations use parallel sorting:

Uses Rayon’s par_sort_by() for efficient multi-threaded sorting
Automatically falls back to sequential sort for small datasets

4. Parallel Distinct

DISTINCT operations use two-phase deduplication:

First phase: parallel identification of unique values per chunk
Second phase: merge of partial results

Query Flow

Query Planning: The planner estimates cardinality and decides on parallel execution.
Data Fetching: Data is fetched from storage with index acceleration where possible.
Parallel Processing: Operations exceeding thresholds are parallelized.
Result Merging: Partial results are combined efficiently.
Result Formation: Final results are returned to the caller.

Performance Benefits

Parallel execution provides significant benefits for analytical queries. Typical gains vary by workload shape, data distribution, and core count:

Filter Operations: Better throughput for complex predicates on large datasets
Hash Joins: Lower join latency for larger joins
Sorting: Faster large ORDER BY workloads
Aggregations: Scales with available CPU cores when data volume is high

EXPLAIN ANALYZE Output

Use EXPLAIN ANALYZE to see parallel execution in action:

EXPLAIN ANALYZE SELECT * FROM large_table WHERE value > 100;
-- Output shows: Parallel Seq Scan on large_table (workers=N)

The output indicates how many worker threads were used for each operation.

Query Types That Benefit Most

Parallel execution provides the greatest benefit for:

Analytical queries that process large amounts of data
Filter-heavy operations with complex conditions
Large joins between tables
Large aggregations over many rows
Sorting large result sets

Best Practices

For optimal performance with Stoolap’s parallel execution:

Ensure sufficient data: Parallel overhead only pays off for larger datasets
Use appropriate indexes: Even with parallelism, indexes are still important
Analyze tables: Run ANALYZE to give the optimizer accurate cardinality estimates
Check EXPLAIN output: Verify parallel execution is being used as expected

Implementation Details

Rayon Integration

Stoolap uses Rayon for parallel execution:

Work-stealing scheduler for optimal load balancing
Automatic thread pool management
Zero-cost abstraction over parallelism

Memory Management

Parallel execution uses efficient memory patterns:

DashMap for concurrent hash tables
Chunk-based processing to limit memory usage
Efficient result merging to avoid copies

Current Capabilities

The parallel execution engine supports:

Parallel filter evaluation
Parallel hash join (build and probe)
Parallel sorting
Parallel distinct/deduplication
Parallel aggregation