Semantic Query Cache

Stoolap implements semantic query caching, an intelligent result caching system that can answer new queries by filtering cached results from previous queries. This goes beyond simple query result caching by understanding the relationship between queries.

Overview

Traditional query caches require exact query matches. Semantic caching understands when a new query is a subset of a cached query and can filter the cached results instead of re-executing against storage.

How It Works

Predicate Subsumption

When Query A caches results, and Query B has a stricter predicate that is a subset of Query A’s predicate, Stoolap can answer Query B by filtering the cached results.

Example

-- Query 1: First execution scans the table
SELECT * FROM orders WHERE amount > 100;
-- Result: 5,000 rows cached

-- Query 2: Semantic cache hit!
-- The predicate (amount > 500) is stricter than (amount > 100)
SELECT * FROM orders WHERE amount > 500;
-- Filters cached 5,000 rows to return 1,000 rows
-- No storage access needed!

Supported Subsumption Patterns

Pattern	Cached Query	New Query	Result
Range Tightening	`col > 100`	`col > 200`	Cache hit
Range Tightening	`col < 100`	`col < 50`	Cache hit
AND Strengthening	`col = 'A'`	`col = 'A' AND x > 5`	Cache hit
IN List Subset	`id IN (1,2,3,4,5)`	`id IN (2,3)`	Cache hit
BETWEEN Narrowing	`BETWEEN 30 AND 70`	`BETWEEN 40 AND 60`	Cache hit
Equality to Range	`col > 0`	`col = 50`	Cache hit

Cache Behavior

Cache Keys

Queries are cached per table with these components:

Table name
Selected columns
Predicate structure

Automatic Invalidation

The cache is automatically invalidated when data changes:

INSERT operations
UPDATE operations
DELETE operations
TRUNCATE operations

-- Query caches results
SELECT * FROM products WHERE price > 100;

-- INSERT invalidates the cache for 'products' table
INSERT INTO products VALUES (101, 'New Product', 150.00);

-- Next query must re-execute (cache miss)
SELECT * FROM products WHERE price > 100;

Cache Limits

Setting	Default	Description
Entries per table	64	Maximum cached queries per table
TTL	5 minutes	Time-to-live for cached results
Eviction	LRU	Least-recently-used eviction policy

When Semantic Caching Helps

Best scenarios for semantic caching:

Drill-down queries: Starting with broad filters, then narrowing
Dashboard queries: Similar queries with different filter values
Range queries: Successive queries on ranges
Exploratory analysis: Progressively filtering data

When Caching Doesn’t Apply

Cache misses occur when:

Predicate is not a strict subset
Query has different columns selected
Table was modified since caching
Cache entry expired (TTL)
Cache is full and entry was evicted

Example Use Case

Consider an analytics dashboard showing orders:

-- Initial dashboard load: scan full table
SELECT * FROM orders WHERE order_date >= '2024-01-01';
-- Caches 50,000 orders

-- User filters to specific status: cache hit!
SELECT * FROM orders WHERE order_date >= '2024-01-01' AND status = 'pending';
-- Filters cached results, returns 5,000 orders

-- User narrows date range: cache hit!
SELECT * FROM orders WHERE order_date >= '2024-06-01' AND status = 'pending';
-- Filters cached results, returns 1,000 orders

-- User changes to different status: cache hit!
SELECT * FROM orders WHERE order_date >= '2024-01-01' AND status = 'shipped';
-- Filters cached results from first query

Performance Impact

Semantic caching provides significant speedups when:

Base query is expensive (large table scan)
Follow-up queries filter the same data
Data changes infrequently between queries

Typical impact:

Cache hits can avoid storage access and reduce latency significantly
Particularly effective for large result sets being filtered down
Measure with EXPLAIN ANALYZE on your dataset to validate expected gains

Best Practices

Order queries from broad to narrow: Cache broad results first
Use consistent column selections: Same columns enable cache reuse
Be aware of cache invalidation: Writes invalidate relevant caches
Consider query patterns: Design queries to maximize cache hits

Monitoring

Currently, cache statistics are not exposed through SQL. Cache behavior is internal to the query executor.