Expression Pushdown
Expression Pushdown
This document explains Stoolap’s expression pushdown optimization, which moves filtering operations closer to the data to improve query performance.
What is Expression Pushdown?
Expression pushdown is a query optimization technique that “pushes” filter expressions (typically from WHERE clauses) down to the lowest possible level in the query execution stack. This minimizes the amount of data that needs to be processed at higher levels of the execution engine.
Benefits of Expression Pushdown
Performance Improvements
- Reduced Data Processing - Filter data early to avoid unnecessary processing
- Reduced Memory Usage - Process only relevant data
- Improved Cache Efficiency - Better CPU cache utilization with smaller data sets
- Optimized I/O - Less data loaded from storage
- Improved Parallelism - Filter operations can be parallelized at lower levels
Measurable Impacts
Depending on the query, expression pushdown can provide:
- 10-100x reduction in processed data volume
- 2-10x improvement in query execution time
- Significant memory usage reduction
How Expression Pushdown Works in Stoolap
Pushdown Levels
Stoolap implements expression pushdown at multiple levels:
- Storage Level Pushdown - Expressions pushed directly to column storage
- Index Level Pushdown - Expressions leveraging indexes
- Scan Level Pushdown - Expressions applied during table scanning
- Join Level Pushdown - Expressions pushed before or into joins
- Subquery Pushdown - Expressions pushed into subqueries
The Pushdown Process
When processing a query with filtering conditions:
- The SQL parser creates an abstract syntax tree (AST)
- The query analyzer examines filter expressions
- Pushdown eligibility is determined for each expression
- Expressions are transformed into a format suitable for lower levels
- Execution plans are modified to incorporate pushed-down expressions
- The storage engine applies filters directly when possible
Expression Types
Not all expressions can be pushed down. Stoolap categorizes expressions by pushdown eligibility:
Fully Pushable Expressions
These expressions can be pushed all the way to the storage layer:
- Simple Comparisons - Equality (=), inequality (!=, <>), greater/less than (>, <, >=, <=)
- Range Conditions - BETWEEN, IN
- Null Checks - IS NULL, IS NOT NULL
- String Patterns - LIKE, SIMILAR TO (with limitations)
- Logical Operations - AND, OR, NOT
-- All filters can be pushed down to storage
SELECT * FROM orders WHERE
status = 'shipped' AND
order_date BETWEEN '2022-01-01' AND '2022-12-31' AND
customer_id IN (1, 2, 3);
Partially Pushable Expressions
These expressions can be partially pushed down:
- Simple Functions - ABS(), LOWER(), UPPER()
- Date/Time Functions - DATE_TRUNC(), EXTRACT()
- Type Casts - CAST(), ::
- Simple Arithmetic - +, -, *, /
-- The date_trunc can be pushed down, making this more efficient
SELECT * FROM orders WHERE
DATE_TRUNC('month', order_date) = DATE_TRUNC('month', CURRENT_DATE);
Non-Pushable Expressions
These typically cannot be pushed down:
- Complex Functions - User-defined functions, complex built-in functions
- Aggregates - SUM(), COUNT(), AVG() (though parts of the expression may be pushed)
- Window Functions - ROW_NUMBER(), RANK()
- Subquery References - Correlated subqueries
-- The aggregate function prevents full pushdown
SELECT * FROM orders WHERE
total > (SELECT AVG(total) FROM orders);
Implementation Details
Expression Translation
Expressions are translated from SQL syntax to storage-level predicates:
- Parser - Parses SQL into an abstract syntax tree
- Analyzer - Identifies pushable expressions
- Translator - Converts SQL expressions to storage predicates
- Optimizer - Determines optimal pushdown strategy
- Executor - Applies the pushdown during execution
Specialized Expression Types
Stoolap implements specialized expression types for efficient pushdown:
- SimpleExpression - Basic comparison operations
- BetweenExpression - Range checks
- InListExpression - Set membership tests
- LogicalExpression - Combining expressions with AND/OR
- NullCheckExpression - NULL checks
- RangeExpression - Optimized range queries
- NotExpression - Logical negation
These expression types are implemented in /internal/storage/expression/
.
Storage-Level Implementation
At the storage level, Stoolap implements optimized filtering:
- Columnar Filtering - Filter operations on column data
- SIMD-Accelerated Predicates - Parallel evaluation using CPU SIMD instructions
- Bitmap Results - Bitmap representation of matching positions
- Vectorized Evaluation - Batch processing of predicates
Pushdown Strategies
Filter Pushdown
Basic filter pushdown moves WHERE conditions down:
-- Before pushdown (conceptual)
Table Scan (orders) → Filter (status = 'shipped') → Project
-- After pushdown (conceptual)
Table Scan with Filter (orders, status = 'shipped') → Project
Join Pushdown
Filters are pushed before or into joins:
-- Before pushdown (conceptual)
Table Scan (orders) → Join with customers → Filter (status = 'shipped')
-- After pushdown (conceptual)
Table Scan with Filter (orders, status = 'shipped') → Join with customers
Index Utilization
Pushed-down expressions leverage indexes:
-- When an index exists on status, this becomes:
Index Scan (orders_status_idx, 'shipped') → Project
Predicate Simplification
Expressions are simplified when possible:
-- Before simplification
WHERE age >= 18 AND age <= 65 AND age != 30
-- After simplification
WHERE age BETWEEN 18 AND 65 AND age != 30
Performance Considerations
When Pushdown Helps Most
Expression pushdown provides the greatest benefits when:
- High Selectivity - Filters eliminate a large percentage of rows
- Early Filtering - Filters can be applied before expensive operations
- Index Availability - Filters can use available indexes
- Complex Queries - Queries with joins, subqueries, or aggregations
Potential Limitations
Some scenarios may limit pushdown benefits:
- Low Selectivity Filters - If most rows match, pushdown may not help much
- Complex Expressions - Not all expressions can be pushed down
- Function-Based Filters - Functions on columns may prevent index usage
Query Examples
Simple Pushdown
-- Highly efficient: Pushes filter to storage and uses index if available
SELECT * FROM customers WHERE country = 'US';
Multiple Filter Pushdown
-- All conditions are pushed down and combined at the storage level
SELECT * FROM orders
WHERE status = 'shipped'
AND order_date > '2022-01-01'
AND total > 100;
Join Pushdown
-- Filters are pushed before the join
SELECT c.name, o.order_date
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE c.country = 'US' AND o.status = 'shipped';
Function Pushdown
-- LOWER function can be pushed down
SELECT * FROM products
WHERE LOWER(name) LIKE '%organic%';
Implementation Details
Stoolap’s expression pushdown is implemented in several components:
- sql/executor/executor.go - High-level pushdown decisions
- sql/executor/query.go - Query planning with pushdown
- sql/executor/filtered_result.go - Filter result implementation
- storage/expression/ - Specialized expression types
- storage/mvcc/scanner.go - Storage-level filtering
Best Practices
To maximize the benefits of expression pushdown:
- Use direct column references - Avoid functions on indexed columns in WHERE clauses
- Create appropriate indexes - Indexes enable better pushdown optimizations
- Use simple predicates - Simple expressions are more likely to be pushed down
- Check query plans - Use EXPLAIN to verify pushdown is occurring
- Combine multiple conditions - AND conditions can be pushed down effectively
Advanced Techniques
Custom Expressions
Stoolap allows for specialized expression types:
-- Range expressions are highly optimized
SELECT * FROM time_series
WHERE timestamp BETWEEN '2022-01-01' AND '2022-01-31';
Combined Index and Expression Pushdown
When expressions match available indexes:
-- With a composite index on (status, order_date), this is very efficient
SELECT * FROM orders
WHERE status = 'shipped' AND order_date > '2022-01-01';
Function Index Pushdown
When functions are used in filtering:
-- If an index exists on LOWER(email), this pushdown is efficient
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';