Stoolap Architecture

High-level overview of Stoolap's architecture and major components

Stoolap Architecture

This document provides a high-level overview of Stoolap’s architecture, including its major components and how they interact.

System Overview

Stoolap is a high-performance Hybrid Transactional/Analytical Processing (HTAP) database engine that combines efficient transactional operations with powerful analytical capabilities. Its architecture prioritizes:

Memory-first design with optional disk persistence
Row-based storage with columnar indexing for hybrid workloads
Multi-version concurrency control (MVCC) for transaction isolation
Vectorized execution for analytical performance
Zero external dependencies

HTAP Architecture

Stoolap combines OLTP and OLAP capabilities in a single system through its HTAP architecture:

OLTP (Transactional) Features

Row-based version store optimized for point lookups and updates
MVCC for isolation and concurrency
Efficient transaction processing with optimistic concurrency control
Low-latency write operations

OLAP (Analytical) Features

Columnar indexing for efficient analytical queries
Vectorized execution engine for batch processing
Expression pushdown for optimized filtering
Specialized indexing strategies for different query patterns

Core Components

Stoolap’s architecture consists of the following major components:

Client Interface

SQL Driver - Standard database/sql driver implementation
Command-Line Interface - Interactive CLI for database operations
API Layer - Programmatic access to database functionality

Query Processing Pipeline

Parser - Converts SQL text into an abstract syntax tree (AST)
- Lexical analyzer (lexer.go)
- Syntax parser (parser.go)
- AST builder (ast.go)
Planner/Optimizer - Converts AST into an optimized execution plan
- Plan generation
- Statistics-based optimization
- Expression optimization
- Join order optimization
Executor - Executes the plan and produces results
- Standard execution engine (executor.go)
- Vectorized execution engine (vectorized/)
- Query cache (query_cache.go)

Storage Engine

MVCC Engine - Multi-version concurrency control for transaction isolation
- Transaction management (transaction.go)
- Version store (version_store.go)
- Visibility rules (registry.go)
Table Management - Table creation and schema handling
- Schema validation
- Table metadata management
- Column type management
Row-Based Storage - Row-oriented data organization for transactional operations
- Row-based version store
- Type-specific storage optimizations
- Segment-based organization
Indexing System - Multiple index types for different access patterns
- B-tree indexes (btree/)
- Bitmap indexes (bitmap/)
- Columnar indexes (mvcc/columnar_index.go) - For analytical acceleration
- Multi-column indexes (mvcc/columnar_index_multi.go)
Persistence Layer - Optional disk storage
- Binary serialization (binser/)
- Snapshot management
- Write-ahead logging (wal_manager.go)

Function System

Function Registry - Central registry for all SQL functions
- Scalar functions (scalar/)
- Aggregate functions (aggregate/)
- Window functions (window/)

Memory Management

Buffer Pool - Reusable memory buffers to reduce allocation overhead
Value Pool - Specialized object pooling for common data types
Segment Maps - Efficient concurrent data structures

Request Flow

When a query is executed, it flows through the system as follows:

Query Submission
- SQL text is submitted via driver, CLI, or API
Parsing and Validation
- SQL is parsed into an AST
- Syntax and semantic validation is performed
- Query is prepared for execution
Planning and Optimization
- Execution plan is generated
- Statistics are used to optimize the plan
- Indexes are selected based on query patterns
- Expression pushdown is applied where possible
Execution
- For read queries:
  - Appropriate isolation level is applied
  - Storage engine provides data with visibility rules
  - Filters and projections are applied
  - Results are processed (joins, aggregations, sorting)
  - Final result set is returned
- For write queries:
  - Transaction is started if not already active
  - Write operations are applied with MVCC rules
  - Indexes are updated
  - Changes are committed or rolled back
Result Handling
- Results are formatted and returned to the client
- Memory is released
- Transaction state is updated

Physical Architecture

In-Memory Mode

In memory-only mode, Stoolap operates entirely in RAM:

All data structures reside in memory
No disk I/O for data access
Highest performance but no durability

Persistent Mode

In persistent mode, Stoolap uses disk storage with memory caching:

Data is stored on disk in a specialized binary format
Write-ahead logging ensures durability
Memory serves as a cache for active data
Background processes manage snapshots and cleanup

Concurrency Model

Stoolap uses a combination of concurrency techniques:

MVCC - Multiple versions of data for transaction isolation
Optimistic Concurrency Control - Transactions validate at commit time
Fine-grained Locking - Minimal contention through targeted locks
Segmented Data Structures - Reduced lock contention
Lock-free Algorithms - Where appropriate for performance

Memory Efficiency

Several techniques are used to minimize memory usage:

Targeted Compression - Type-specific compression algorithms
Memory Pooling - Reuse of memory allocations
Reference Counting - Efficient resource management
SIMD Operations - Processing multiple values with single instructions

Implementation Details

The core implementation is organized as follows:

/cmd/stoolap/ - Command-line interface
/pkg/ - Public API and driver implementation
/internal/ - Core implementation details
- /internal/parser/ - SQL parsing
- /internal/sql/ - SQL execution
- /internal/storage/ - Storage engine
- /internal/functions/ - SQL function implementations
- /internal/common/ - Common utilities
- /internal/fastmap/ - High-performance data structures

Architectural Principles

Stoolap’s architecture is guided by the following principles:

Performance First - Optimize for speed and memory efficiency
Zero Dependencies - Rely only on the Go standard library
Modularity - Clean component interfaces for extensibility
Simplicity - Favor simple solutions over complex ones
Data Integrity - Ensure consistent and correct results