Parallel Execution
Parallel Execution in Stoolap
Overview
Stoolap’s query execution engine is designed to accelerate SQL query processing by operating on data in parallel batches, allowing for better CPU utilization across multiple cores. This document explains the architecture and components of the parallel execution system.
Key Concepts
Batch Processing
The parallel execution model processes data in batches to improve throughput. Instead of processing one row at a time, the engine processes multiple rows simultaneously using Rayon’s work-stealing scheduler.
Key characteristics:
- Automatic parallelization based on data size
- Work-stealing for optimal load balancing
- Configurable thresholds for parallel execution
Parallelization Thresholds
Stoolap automatically parallelizes operations based on data size:
| Operation | Threshold | Description |
|---|---|---|
| Filter (WHERE) | 10,000 rows | Parallel predicate evaluation |
| Hash Join | 5,000 rows | Parallel hash build and probe |
| ORDER BY | 50,000 rows | Parallel sorting |
| DISTINCT | 10,000 rows | Two-phase parallel deduplication |
Architecture Components
1. Parallel Filter
For large tables, WHERE clause evaluation is parallelized:
- Data is divided into chunks
- Each chunk is processed by a separate thread
- Results are merged using efficient concurrent data structures
2. Parallel Hash Join
Hash joins are parallelized in two phases:
Build Phase:
- The build side is partitioned across threads
- Hash table is constructed using DashMap (concurrent hash map)
Probe Phase:
- The probe side is processed in parallel
- Each thread looks up matches in the shared hash table
3. Parallel Sort
Large ORDER BY operations use parallel sorting:
- Uses Rayon’s
par_sort_by()for efficient multi-threaded sorting - Automatically falls back to sequential sort for small datasets
4. Parallel Distinct
DISTINCT operations use two-phase deduplication:
- First phase: parallel identification of unique values per chunk
- Second phase: merge of partial results
Query Flow
-
Query Planning: The planner estimates cardinality and decides on parallel execution.
-
Data Fetching: Data is fetched from storage with index acceleration where possible.
-
Parallel Processing: Operations exceeding thresholds are parallelized.
-
Result Merging: Partial results are combined efficiently.
-
Result Formation: Final results are returned to the caller.
Performance Benefits
Parallel execution provides significant benefits for analytical queries:
- Filter Operations: 2-5x speedup for complex predicates
- Hash Joins: 2-4x speedup for large joins
- Sorting: 3-6x speedup for large ORDER BY
- Aggregations: Linear speedup with core count
EXPLAIN ANALYZE Output
Use EXPLAIN ANALYZE to see parallel execution in action:
EXPLAIN ANALYZE SELECT * FROM large_table WHERE value > 100;
-- Output shows: Parallel Seq Scan on large_table (workers=N)
The output indicates how many worker threads were used for each operation.
Query Types That Benefit Most
Parallel execution provides the greatest benefit for:
- Analytical queries that process large amounts of data
- Filter-heavy operations with complex conditions
- Large joins between tables
- Large aggregations over many rows
- Sorting large result sets
Best Practices
For optimal performance with Stoolap’s parallel execution:
- Ensure sufficient data: Parallel overhead only pays off for larger datasets
- Use appropriate indexes: Even with parallelism, indexes are still important
- Analyze tables: Run ANALYZE to give the optimizer accurate cardinality estimates
- Check EXPLAIN output: Verify parallel execution is being used as expected
Implementation Details
Rayon Integration
Stoolap uses Rayon for parallel execution:
- Work-stealing scheduler for optimal load balancing
- Automatic thread pool management
- Zero-cost abstraction over parallelism
Memory Management
Parallel execution uses efficient memory patterns:
- DashMap for concurrent hash tables
- Chunk-based processing to limit memory usage
- Efficient result merging to avoid copies
Current Capabilities
The parallel execution engine supports:
- Parallel filter evaluation
- Parallel hash join (build and probe)
- Parallel sorting
- Parallel distinct/deduplication
- Parallel aggregation