DISTINCT Operations
DISTINCT Operations
This document explains DISTINCT operations in Stoolap based on the implementation and test files.
Overview
Based on test files (/test/distinct_test.go
), Stoolap supports the DISTINCT keyword to eliminate duplicate rows from query results. This feature is essential for retrieving unique values or combinations of values from tables.
Syntax
Stoolap supports these DISTINCT syntax patterns:
-- Basic DISTINCT on single column
SELECT DISTINCT column FROM table;
-- DISTINCT on multiple columns
SELECT DISTINCT column1, column2, ... FROM table;
-- DISTINCT with ORDER BY
SELECT DISTINCT column FROM table ORDER BY column;
-- COUNT with DISTINCT
SELECT COUNT(DISTINCT column) FROM table;
Examples from Test Files
Basic DISTINCT on Single Column
-- Create a test table
CREATE TABLE products (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
category TEXT,
price FLOAT
);
-- Insert sample data with duplicate categories
INSERT INTO products (id, name, category, price) VALUES
(1, 'Laptop', 'Electronics', 999.99),
(2, 'Smartphone', 'Electronics', 499.99),
(3, 'Headphones', 'Electronics', 99.99),
(4, 'T-shirt', 'Clothing', 19.99),
(5, 'Jeans', 'Clothing', 49.99),
(6, 'Novel', 'Books', 14.99),
(7, 'Textbook', 'Books', 79.99);
-- Select distinct categories
SELECT DISTINCT category FROM products;
-- Result: 3 rows
-- Electronics
-- Clothing
-- Books
DISTINCT on Multiple Columns
-- Create a table with region information
CREATE TABLE sales (
id INTEGER PRIMARY KEY,
product TEXT,
category TEXT,
region TEXT,
amount FLOAT
);
-- Insert data with duplicate combinations
INSERT INTO sales (id, product, category, region, amount) VALUES
(1, 'Laptop', 'Electronics', 'North', 999.99),
(2, 'Smartphone', 'Electronics', 'South', 499.99),
(3, 'Headphones', 'Electronics', 'North', 99.99),
(4, 'T-shirt', 'Clothing', 'East', 19.99),
(5, 'Jeans', 'Clothing', 'West', 49.99),
(6, 'T-shirt', 'Clothing', 'North', 19.99),
(7, 'Novel', 'Books', 'South', 14.99);
-- Select distinct category and region combinations
SELECT DISTINCT category, region FROM sales;
-- Result: 6 rows (unique combinations)
-- Electronics, North
-- Electronics, South
-- Clothing, East
-- Clothing, West
-- Clothing, North
-- Books, South
DISTINCT with ORDER BY
-- Select distinct regions ordered alphabetically
SELECT DISTINCT region FROM sales ORDER BY region;
-- Result: 4 rows in alphabetical order
-- East
-- North
-- South
-- West
COUNT with DISTINCT
-- Count distinct categories
SELECT COUNT(DISTINCT category) FROM sales;
-- Result: 3 (Electronics, Clothing, Books)
How DISTINCT Works in Stoolap
Stoolap implements DISTINCT operations through the following mechanism:
- The query executor detects the DISTINCT keyword during SQL parsing
- After retrieving the base result set, but before applying ORDER BY or LIMIT, the result is filtered for uniqueness
- A map tracks unique rows to eliminate duplicates
- For multiple columns, the uniqueness is based on the combination of all column values
DISTINCT with NULL Values
NULL values are considered distinct values in DISTINCT operations:
-- Create a table with NULL values
CREATE TABLE null_test (
id INTEGER PRIMARY KEY,
value TEXT
);
-- Insert data with NULLs
INSERT INTO null_test (id, value) VALUES
(1, 'A'),
(2, 'B'),
(3, NULL),
(4, 'A'),
(5, NULL);
-- Select distinct values
SELECT DISTINCT value FROM null_test;
-- Result: 3 rows
-- A
-- B
-- NULL
Performance Considerations
Based on the implementation:
- Memory Usage: DISTINCT operations require memory to track unique rows
- Large Result Sets: For very large tables, DISTINCT can be memory-intensive
- Column Count: DISTINCT on multiple columns requires more processing than single columns
- Data Cardinality: Performance depends on the ratio of unique values to total rows
Best Practices
- Be Selective: Only use DISTINCT when you actually need to remove duplicates
- Consider Alternatives: For counting unique values, unique indexes may be more efficient
- Column Order: For multi-column DISTINCT, put highest-cardinality columns first if you’re also using ORDER BY
- Index Usage: Properly indexed columns can improve DISTINCT operations
Implementation Details
Internally, Stoolap:
- Creates a map to track unique rows
- Generates a unique key for each row based on its values
- Only passes through rows that haven’t been seen before
- Optimizes COUNT(DISTINCT) operations with specialized processing
These optimizations ensure that DISTINCT operations are both correct and reasonably efficient for most use cases.