Stoolap Architecture
High-level overview of Stoolap's architecture and major components
Stoolap Architecture
This document provides a high-level overview of Stoolap’s architecture, including its major components and how they interact.
System Overview
Stoolap is a high-performance Hybrid Transactional/Analytical Processing (HTAP) database engine that combines efficient transactional operations with powerful analytical capabilities. Its architecture prioritizes:
- Memory-first design with optional disk persistence
- Row-based storage with columnar indexing for hybrid workloads
- Multi-version concurrency control (MVCC) for transaction isolation
- Vectorized execution for analytical performance
- Zero external dependencies
HTAP Architecture
Stoolap combines OLTP and OLAP capabilities in a single system through its HTAP architecture:
OLTP (Transactional) Features
- Row-based version store optimized for point lookups and updates
- MVCC for isolation and concurrency
- Efficient transaction processing with optimistic concurrency control
- Low-latency write operations
OLAP (Analytical) Features
- Columnar indexing for efficient analytical queries
- Vectorized execution engine for batch processing
- Expression pushdown for optimized filtering
- Specialized indexing strategies for different query patterns
Core Components
Stoolap’s architecture consists of the following major components:
Client Interface
- SQL Driver - Standard database/sql driver implementation
- Command-Line Interface - Interactive CLI for database operations
- API Layer - Programmatic access to database functionality
Query Processing Pipeline
- Parser - Converts SQL text into an abstract syntax tree (AST)
- Lexical analyzer (lexer.go)
- Syntax parser (parser.go)
- AST builder (ast.go)
- Planner/Optimizer - Converts AST into an optimized execution plan
- Plan generation
- Statistics-based optimization
- Expression optimization
- Join order optimization
- Executor - Executes the plan and produces results
- Standard execution engine (executor.go)
- Vectorized execution engine (vectorized/)
- Query cache (query_cache.go)
Storage Engine
- MVCC Engine - Multi-version concurrency control for transaction isolation
- Transaction management (transaction.go)
- Version store (version_store.go)
- Visibility rules (registry.go)
- Table Management - Table creation and schema handling
- Schema validation
- Table metadata management
- Column type management
- Row-Based Storage - Row-oriented data organization for transactional operations
- Row-based version store
- Type-specific storage optimizations
- Segment-based organization
- Indexing System - Multiple index types for different access patterns
- B-tree indexes (btree/)
- Bitmap indexes (bitmap/)
- Columnar indexes (mvcc/columnar_index.go) - For analytical acceleration
- Multi-column indexes (mvcc/columnar_index_multi.go)
- Persistence Layer - Optional disk storage
- Binary serialization (binser/)
- Snapshot management
- Write-ahead logging (wal_manager.go)
Function System
- Function Registry - Central registry for all SQL functions
- Scalar functions (scalar/)
- Aggregate functions (aggregate/)
- Window functions (window/)
Memory Management
- Buffer Pool - Reusable memory buffers to reduce allocation overhead
- Value Pool - Specialized object pooling for common data types
- Segment Maps - Efficient concurrent data structures
Request Flow
When a query is executed, it flows through the system as follows:
- Query Submission
- SQL text is submitted via driver, CLI, or API
- Parsing and Validation
- SQL is parsed into an AST
- Syntax and semantic validation is performed
- Query is prepared for execution
- Planning and Optimization
- Execution plan is generated
- Statistics are used to optimize the plan
- Indexes are selected based on query patterns
- Expression pushdown is applied where possible
- Execution
- For read queries:
- Appropriate isolation level is applied
- Storage engine provides data with visibility rules
- Filters and projections are applied
- Results are processed (joins, aggregations, sorting)
- Final result set is returned
- For write queries:
- Transaction is started if not already active
- Write operations are applied with MVCC rules
- Indexes are updated
- Changes are committed or rolled back
- For read queries:
- Result Handling
- Results are formatted and returned to the client
- Memory is released
- Transaction state is updated
Physical Architecture
In-Memory Mode
In memory-only mode, Stoolap operates entirely in RAM:
- All data structures reside in memory
- No disk I/O for data access
- Highest performance but no durability
Persistent Mode
In persistent mode, Stoolap uses disk storage with memory caching:
- Data is stored on disk in a specialized binary format
- Write-ahead logging ensures durability
- Memory serves as a cache for active data
- Background processes manage snapshots and cleanup
Concurrency Model
Stoolap uses a combination of concurrency techniques:
- MVCC - Multiple versions of data for transaction isolation
- Optimistic Concurrency Control - Transactions validate at commit time
- Fine-grained Locking - Minimal contention through targeted locks
- Segmented Data Structures - Reduced lock contention
- Lock-free Algorithms - Where appropriate for performance
Memory Efficiency
Several techniques are used to minimize memory usage:
- Targeted Compression - Type-specific compression algorithms
- Memory Pooling - Reuse of memory allocations
- Reference Counting - Efficient resource management
- SIMD Operations - Processing multiple values with single instructions
Implementation Details
The core implementation is organized as follows:
/cmd/stoolap/
- Command-line interface/pkg/
- Public API and driver implementation/internal/
- Core implementation details/internal/parser/
- SQL parsing/internal/sql/
- SQL execution/internal/storage/
- Storage engine/internal/functions/
- SQL function implementations/internal/common/
- Common utilities/internal/fastmap/
- High-performance data structures
Architectural Principles
Stoolap’s architecture is guided by the following principles:
- Performance First - Optimize for speed and memory efficiency
- Zero Dependencies - Rely only on the Go standard library
- Modularity - Clean component interfaces for extensibility
- Simplicity - Favor simple solutions over complex ones
- Data Integrity - Ensure consistent and correct results