Aggregate Functions

This document describes the aggregate functions available in Stoolap based on test files and implementation details.

Overview

Based on test files (/test/aggregation_test.go, /test/count_aggregate_test.go, /test/first_last_aggregate_test.go), Stoolap supports standard SQL aggregate functions that operate on multiple rows to calculate a single result. These functions are typically used with GROUP BY clauses for summarizing data.

Supported Aggregate Functions

COUNT

Counts the number of rows or non-NULL values:

-- Count all rows
SELECT COUNT(*) FROM table_name;

-- Count non-NULL values in a column
SELECT COUNT(column_name) FROM table_name;

-- Count distinct values
SELECT COUNT(DISTINCT column_name) FROM table_name;

From /test/count_aggregate_test.go and /test/count_with_index_test.go, COUNT is optimized to use available indexes when possible.

SUM

Calculates the sum of values in a column:

-- Sum all values in a numeric column
SELECT SUM(amount) FROM orders;

-- Sum with grouping
SELECT category, SUM(amount) FROM orders GROUP BY category;

Used in /test/aggregation_test.go.

AVG

Calculates the average (mean) of values in a column:

-- Average of all values in a numeric column
SELECT AVG(price) FROM products;

-- Average with grouping
SELECT category, AVG(price) FROM products GROUP BY category;

Used in /test/aggregation_test.go.

MIN

Finds the minimum value in a column:

-- Minimum value in a column
SELECT MIN(price) FROM products;

-- Minimum with grouping
SELECT category, MIN(price) FROM products GROUP BY category;

Used in /test/aggregation_test.go.

MAX

Finds the maximum value in a column:

-- Maximum value in a column
SELECT MAX(price) FROM products;

-- Maximum with grouping
SELECT category, MAX(price) FROM products GROUP BY category;

Used in /test/aggregation_test.go.

FIRST

Returns the first value in a group, based on the order of rows:

-- First value in a column
SELECT FIRST(name) FROM users ORDER BY created_at;

-- First value with grouping
SELECT category, FIRST(name) FROM products GROUP BY category ORDER BY price;

From /test/first_last_aggregate_test.go, FIRST depends on the order of rows in the result set.

LAST

Returns the last value in a group, based on the order of rows:

-- Last value in a column
SELECT LAST(name) FROM users ORDER BY created_at;

-- Last value with grouping
SELECT category, LAST(name) FROM products GROUP BY category ORDER BY price;

From /test/first_last_aggregate_test.go, LAST depends on the order of rows in the result set.

Examples from Test Files

Basic Aggregation

From /test/aggregation_test.go:

-- Create a test table
CREATE TABLE sales (
    id INTEGER PRIMARY KEY,
    product TEXT,
    category TEXT,
    amount FLOAT
);

-- Insert test data
INSERT INTO sales (id, product, category, amount) VALUES
(1, 'Laptop', 'Electronics', 1200.00),
(2, 'Smartphone', 'Electronics', 800.00),
(3, 'Headphones', 'Electronics', 150.00),
(4, 'T-shirt', 'Clothing', 25.00),
(5, 'Jeans', 'Clothing', 50.00),
(6, 'Shoes', 'Clothing', 70.00);

-- Simple aggregation
SELECT 
    SUM(amount) AS total_sales,
    AVG(amount) AS avg_sale,
    MIN(amount) AS min_sale,
    MAX(amount) AS max_sale,
    COUNT(*) AS transaction_count
FROM sales;

-- Aggregation with GROUP BY
SELECT 
    category,
    SUM(amount) AS category_total,
    AVG(amount) AS category_avg,
    COUNT(*) AS category_count
FROM sales
GROUP BY category;

COUNT Function Variants

From /test/count_aggregate_test.go:

-- Create a test table
CREATE TABLE test_count (
    id INTEGER PRIMARY KEY,
    name TEXT,
    value INTEGER,
    category TEXT
);

-- Insert test data including NULL values
INSERT INTO test_count (id, name, value, category) VALUES
(1, 'Item 1', 10, 'A'),
(2, 'Item 2', 20, 'A'),
(3, 'Item 3', NULL, 'B'),
(4, 'Item 4', 40, 'B'),
(5, 'Item 5', 50, 'C');

-- COUNT(*) - counts all rows
SELECT COUNT(*) FROM test_count;  -- Returns 5

-- COUNT(column) - counts non-NULL values in a column
SELECT COUNT(value) FROM test_count;  -- Returns 4

-- COUNT with GROUP BY
SELECT category, COUNT(*) FROM test_count GROUP BY category;

-- COUNT DISTINCT
SELECT COUNT(DISTINCT category) FROM test_count;  -- Returns 3

FIRST and LAST Functions

From /test/first_last_aggregate_test.go:

-- Create a test table
CREATE TABLE test_first_last (
    id INTEGER PRIMARY KEY,
    name TEXT,
    category TEXT,
    value INTEGER,
    created_at TIMESTAMP
);

-- Insert test data
INSERT INTO test_first_last (id, name, category, value, created_at) VALUES
(1, 'Item 1', 'A', 10, '2023-01-01'),
(2, 'Item 2', 'A', 20, '2023-01-02'),
(3, 'Item 3', 'B', 30, '2023-01-03'),
(4, 'Item 4', 'B', 40, '2023-01-04'),
(5, 'Item 5', 'C', 50, '2023-01-05');

-- FIRST function (returns the first value encountered)
SELECT FIRST(name) FROM test_first_last;  -- Returns 'Item 1'

-- LAST function (returns the last value encountered)
SELECT LAST(name) FROM test_first_last;  -- Returns 'Item 5'

-- FIRST and LAST with ORDER BY (affects the result)
SELECT FIRST(name) FROM test_first_last ORDER BY value DESC;  -- Returns 'Item 5'
SELECT LAST(name) FROM test_first_last ORDER BY value DESC;  -- Returns 'Item 1'

-- FIRST and LAST with GROUP BY
SELECT category, FIRST(name), LAST(name), COUNT(*) 
FROM test_first_last 
GROUP BY category 
ORDER BY category;

Using Aggregate Functions with GROUP BY

The GROUP BY clause is used with aggregate functions to group rows that have the same values:

SELECT category, COUNT(*) AS count, SUM(amount) AS total, AVG(amount) AS average
FROM sales
GROUP BY category;

Using Aggregate Functions with HAVING

The HAVING clause is used to filter groups based on the result of an aggregate function:

SELECT category, COUNT(*) AS count, SUM(amount) AS total
FROM sales
GROUP BY category
HAVING COUNT(*) > 1 AND SUM(amount) > 100;

NULL Handling

Based on the tests, Stoolap follows standard SQL NULL handling for aggregate functions:

NULL values are ignored by most aggregate functions (SUM, AVG, MIN, MAX)
COUNT(*) includes all rows, regardless of NULL values
COUNT(column) only counts non-NULL values
If all inputs to an aggregate function are NULL, the result is typically NULL (except for COUNT, which returns 0)

Performance Considerations

From the implementations and test files:

COUNT(*) is optimized to use the smallest available index when possible
COUNT DISTINCT can be more expensive for large datasets
Aggregations benefit from proper indexing on grouped columns
For large tables, consider using WHERE clauses to reduce the input size before aggregation

Implementation Details

From the test files and code inspection:

Aggregate functions are implemented in /internal/functions/aggregate/
Each function has its own implementation file (avg.go, count.go, etc.)
Optimizations exist for common patterns like COUNT(*)