Performance Optimization

Performance Optimization in Stoolap

This document provides guidelines and best practices for optimizing performance in Stoolap, including database design, query optimization, and system configuration.

Database Design Optimization

Table Design

  • Choose appropriate data types - Use the smallest data type that can reliably store your data
  • Normalize when appropriate - Balance normalization for data integrity with denormalization for query performance
  • Use primary keys - Every table should have a primary key, preferably a simple integer
  • Consider column order - Place frequently used columns first
  • Limit column count - Tables with fewer columns generally perform better

Indexing Strategy

  • Index selection - Create indexes on columns used in WHERE, JOIN, and ORDER BY clauses
  • Multi-column indexes - Create multi-column indexes for frequently combined filters
  • Index order matters - For multi-column indexes, place high-selectivity columns first
  • Avoid over-indexing - Each index increases write overhead and storage requirements
  • Monitor index usage - Periodically review which indexes are being used
-- Create an index on a frequently filtered column
CREATE INDEX idx_user_email ON users (email);

-- Create a multi-column index for common query patterns
CREATE INDEX idx_product_category_price ON products (category_id, price);

Query Optimization

SELECT Statement Optimization

  • Select only needed columns - Avoid SELECT * when possible
  • Use WHERE clauses effectively - Apply filters early to reduce the result set
  • Leverage indexes - Ensure queries can use available indexes
  • Minimize function calls - Avoid functions on indexed columns in WHERE clauses
  • Use LIMIT for large result sets - Apply LIMIT to prevent excessive memory usage
-- Instead of this:
SELECT * FROM large_table WHERE status = 'active';

-- Do this:
SELECT id, name, created_at FROM large_table WHERE status = 'active' LIMIT 1000;

JOIN Optimization

  • Join order - Join smaller tables first when possible
  • Use appropriate join types - Choose INNER, LEFT, RIGHT joins as needed
  • Index join columns - Ensure columns used in join conditions are indexed
  • Consider denormalization - For critical queries, strategic denormalization may help
-- Ensure both user_id in orders and id in users are indexed
SELECT u.name, o.order_date FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'completed';

Aggregate Query Optimization

  • Filter before aggregating - Apply WHERE clauses before GROUP BY
  • Index GROUP BY columns - Ensure columns used in GROUP BY are indexed
  • Use HAVING efficiently - Apply HAVING only for conditions on aggregated results
-- Filter first, then aggregate
SELECT category_id, COUNT(*) FROM products
WHERE price > 100
GROUP BY category_id
HAVING COUNT(*) > 10;

Prepared Statements

Use prepared statements for repeated queries to leverage query caching:

-- Example in application code
preparedStmt, err := db.Prepare("SELECT * FROM users WHERE id = ?")
// Use preparedStmt.Query() multiple times with different parameters

Vectorized Execution

Stoolap’s vectorized execution engine processes data in batches:

  • Batch size - Larger batches may improve throughput for bulk operations
  • Column order - Organize columns to maximize vectorization benefits
  • Data layout - Structure data to leverage SIMD operations

Transaction Management

  • Keep transactions short - Long-running transactions can impact concurrency
  • Choose appropriate isolation level - Use the minimum isolation level needed
  • Batch operations - Group related operations within a single transaction
  • Handle conflicts - Implement retry logic for optimistic concurrency conflicts
-- Example of a focused transaction
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

Memory Management

  • Buffer pool sizing - Configure buffer pool size based on available memory
  • Query memory limits - Set appropriate memory limits for complex queries
  • Monitor memory usage - Watch for excessive memory consumption

Bulk Operations

Use bulk operations for better performance:

-- Bulk insert example
INSERT INTO products (name, price, category_id) VALUES
  ('Product A', 10.99, 1),
  ('Product B', 15.99, 1),
  ('Product C', 20.99, 2);

Performance Monitoring

Query Analysis

Use EXPLAIN to understand query execution:

EXPLAIN SELECT * FROM products WHERE category_id = 3;

Performance Profiling

Enable profiling for detailed performance analysis:

SET profiling = ON;
-- Run your query
SHOW PROFILE;

Implementation-Specific Optimizations

Stoolap includes several specialized optimizations:

SIMD Operations

Stoolap uses SIMD (Single Instruction, Multiple Data) instructions for:

  • Filtering - Fast predicate evaluation using SIMD
  • Aggregation - Parallel aggregation operations
  • Sorting - SIMD-accelerated sorting algorithms

Memory-Mapped I/O

For disk-based storage, Stoolap uses memory-mapped I/O to reduce system call overhead.

Custom Data Structures

Stoolap uses specialized data structures for better performance:

  • Segment-based maps - Reduced lock contention in concurrent operations
  • Int64-optimized maps - Specialized maps for integer keys
  • Buffer pools - Reusable memory buffers to reduce allocation overhead

Advanced Optimization Techniques

Expression Pushdown

Stoolap pushes down expressions to minimize data processing:

-- Filter and projection will be pushed down to the storage layer
SELECT name, price FROM products WHERE price > 100;

Join Algorithms

Stoolap selects among several join algorithms:

  • Hash Join - For equality joins with large tables
  • Merge Join - For pre-sorted data
  • Nested Loop Join - For small tables or when using indexes

Parallel Execution

Stoolap can execute some operations in parallel:

  • Parallel scans - Multiple segments scanned concurrently
  • Parallel aggregations - Divided work for faster aggregations
  • Concurrent index operations - Background index building

Best Practices Summary

  1. Design schema carefully - Choose appropriate data types and normalization level
  2. Create targeted indexes - Index columns used in filters, joins, and sorts
  3. Write optimized queries - Select only needed columns and filter early
  4. Use prepared statements - Leverage the query cache for repeated queries
  5. Manage transactions efficiently - Keep transactions short and focused
  6. Monitor and analyze - Use EXPLAIN and profiling to identify bottlenecks
  7. Update statistics - Ensure the query optimizer has accurate information
  8. Consider bulk operations - Use bulk inserts and updates for better throughput
  9. Leverage vectorized execution - Structure operations to benefit from batch processing
  10. Configure for your workload - Adjust memory settings based on your specific needs