Indexing
How indexes work in Stoolap and when to use them
Indexing in Stoolap
This document explains Stoolap’s indexing system, including the types of indexes available, when to use each type, and best practices for index management.
Index Types
Stoolap supports several types of indexes, each optimized for different query patterns in its HTAP architecture:
1. Columnar Indexes
Columnar indexes are a specialized index type in Stoolap’s HTAP architecture, providing analytical capabilities on top of row-based storage:
- Design: Maps column values to row positions for efficient analytical queries
- Strengths: Fast range scans and equality lookups for analytical workloads
- Use Cases: WHERE clause filtering, range queries, sort operations
- Implementation: Implemented in
columnar_index.go
-- Create a columnar index
CREATE INDEX idx_user_signup ON users (signup_date);
2. Multi-Column Indexes
Multi-column indexes support filtering on multiple columns simultaneously:
- Design: Maps combinations of column values to row positions
- Strengths: Combined column filters, complex predicates
- Use Cases: Queries with multiple WHERE conditions
- Implementation: Implemented in
columnar_index_multi.go
-- Create a multi-column index
CREATE INDEX idx_product_category_price ON products (category_id, price);
3. B-tree Indexes
B-tree indexes are traditional balanced tree structures:
- Design: Balanced tree structure for efficient lookups
- Strengths: General-purpose indexing with good performance
- Use Cases: Primary keys, unique constraints, range queries
- Implementation: Implemented in
btree/index.go
andbtree/btree.go
-- Create a B-tree index (default index type)
CREATE INDEX idx_order_date ON orders (order_date);
4. Bitmap Indexes
Bitmap indexes use bit arrays for efficient filtering:
- Design: Bit vectors representing presence/absence of values
- Strengths: Low cardinality columns, boolean flags
- Use Cases: Filtering on status fields, boolean columns, enums
- Implementation: Implemented in
bitmap/index.go
andbitmap/bitmap.go
-- Create a bitmap index
CREATE BITMAP INDEX idx_product_status ON products (status);
Index Selection and Usage
Creating Indexes
Indexes are created using standard SQL syntax:
-- Basic index creation
CREATE INDEX index_name ON table_name (column_name);
-- Multiple column index
CREATE INDEX index_name ON table_name (column1, column2, column3);
-- Unique index
CREATE UNIQUE INDEX index_name ON table_name (column_name);
-- Bitmap index (explicitly specified)
CREATE BITMAP INDEX index_name ON table_name (column_name);
When Stoolap Uses Indexes
The query optimizer considers indexes for:
- WHERE Clause Filtering - Matching rows based on column values
- JOIN Operations - Finding matching rows between tables
- ORDER BY - Avoiding sort operations
- GROUP BY - Pre-sorting data for group operations
- UNIQUE Constraints - Enforcing uniqueness
Index Selection Factors
The optimizer considers several factors when choosing indexes:
- Cardinality - Number of unique values in the indexed columns
- Selectivity - Percentage of rows that match the filter
- Index Type - Different index types excel at different operations
- Available Statistics - Data distribution information
- Query Complexity - Join conditions, filter expressions
- Predicate Types - Equality, range, LIKE, etc.
Index Implementation Details
Columnar Index
The columnar index is specialized for Stoolap’s HTAP architecture, providing analytical capabilities on top of the row-based storage:
- Maps column values to row positions efficiently for analytical access patterns
- Organizes values for fast range and equality searches on column values
- Maintains version information for MVCC
- Implements specific optimizations for different data types
- Provides fast access paths for analytical query patterns
For complex filtering, Stoolap’s columnar index uses:
- Sorted value lists for efficient binary search
- Bitmap results for combining multiple conditions
- SIMD operations for parallel processing
Multi-Column Index
Multi-column indexes extend the columnar index concept:
- Maps combinations of column values to positions
- Maintains column order importance (leftmost principle)
- Supports partial matching on prefixes of the index
- Enables advanced filtering across multiple dimensions
- Optimizes storage based on value distribution
Primary Keys and Unique Indexes
Primary keys and unique indexes provide additional guarantees:
- Enforce uniqueness constraints on columns
- Serve as the primary access path for the table
- Enable efficient lookups by key value
- Support foreign key relationships
Index Management
Showing Indexes
View existing indexes using the SHOW INDEXES command:
SHOW INDEXES FROM table_name;
Dropping Indexes
Remove indexes with the DROP INDEX command:
DROP INDEX index_name;
Index Statistics
Stoolap maintains statistics about indexes:
- Number of unique values (cardinality)
- Value distribution information
- Index size and memory usage
- Access patterns and effectiveness
Indexing and MVCC
Stoolap’s indexes are integrated with the MVCC system:
- Indexes track row visibility based on transaction information
- Index operations see consistent snapshots based on transaction isolation
- Insert/update/delete operations update indexes transactionally
- Index entries include version information
Optimizations
Stoolap implements several index optimizations:
SIMD-Accelerated Filtering
- Uses CPU SIMD instructions for parallel processing
- Significantly accelerates filtering operations
- Particularly effective for columnar indexes
Index-Only Scans
- When all required data is in the index, table access is avoided
- Improves performance by reducing data access
Covering Indexes
- Indexes that include all columns needed by a query
- Enable index-only scans for better performance
Bloom Filters
- Probabilistic data structure to quickly determine if a value might exist
- Used to accelerate certain index operations
Best Practices
When to Create Indexes
Create indexes on:
- Primary key columns - Always index primary keys
- Foreign key columns - Improves join performance
- Columns in WHERE clauses - Speeds up filtering
- Columns in JOIN conditions - Accelerates joins
- Columns in ORDER BY - Avoids sorting
- Columns in GROUP BY - Improves aggregation
Index Design Guidelines
For optimal performance:
- Create selective indexes - Indexes on columns with many unique values
- Consider column order - Place high-selectivity columns first in multi-column indexes
- Limit index count - Don’t over-index tables (increases write overhead)
- Monitor index usage - Drop unused indexes
- Choose appropriate index types - Match index types to query patterns
- Balance read/write performance - Indexes speed up reads but slow down writes
Common Indexing Mistakes
Avoid these common issues:
- Indexing low-cardinality columns alone - Limited benefit
- Over-indexing - Too many indexes hurt write performance
- Incorrect multi-column index order - Reduces effectiveness
- Not indexing JOIN columns - Leads to poor join performance
- Indexing very large text fields - Consider partial indexing or function indexes
Performance Considerations
Write Impact
Indexes affect write operations:
- Each index requires additional updates during inserts/updates/deletes
- More indexes mean slower write operations
- Index maintenance consumes CPU and memory
Space Requirements
Indexes increase storage requirements:
- Each index increases the database size
- Different index types have different space characteristics
- Consider space-performance tradeoffs
Query Performance
Indexes dramatically improve query performance:
- Reduce the amount of data scanned
- Speed up filter operations
- Eliminate sorting for ORDER BY
- Accelerate join operations
Implementation Notes
Stoolap’s indexes are implemented in:
/internal/storage/mvcc/columnar_index.go
- Columnar index implementation/internal/storage/mvcc/columnar_index_multi.go
- Multi-column index/internal/storage/btree/index.go
- B-tree index/internal/storage/bitmap/index.go
- Bitmap index/internal/sql/executor/executor.go
- Index selection logic