Aleph Filter — To Infinity in Constant Time

A probabilistic data structure that enables O(1) membership queries and expands infinitely without rebuilding. Based on VLDB 2024 research. First non-Java implementation. Benchmarked on Apple M2 with Criterion.

Filters that can't grow

Probabilistic filters — Bloom, Cuckoo — are critical for database performance. They answer "is this element possibly in the set?" without touching disk.

But they have a fundamental limitation: they can't expand. When your data grows beyond capacity, you rebuild from scratch. For a database that needs to scale, this is a non-starter.

Infinite expansion, O(1) queries

The Aleph Filter uses a clever insight: when fingerprints run out of bits during expansion, duplicate entries to both possible bucket locations instead of using overflow tables.

This preserves O(1) lookup while allowing infinite growth. No rebuilding. No performance degradation. The false positive rate increases predictably and can be managed with compaction.

01

Start with a Cuckoo Filter

Elements are stored as fingerprints in buckets. Each element maps to two possible locations via a hash and an XOR-based alternate.

02

Double the table when full

When the filter fills up, double the number of buckets. Use one extra bit from each fingerprint to determine which half the entry belongs to.

03

Duplicate when bits run out

When a fingerprint has no more bits to split on, place a copy in both possible locations. Queries still check at most 2 buckets — O(1) is preserved.

04

Scale forever

Repeat indefinitely. The false positive rate grows with each expansion but can be controlled with periodic compaction. No rebuilding required.

From paper to production

RustFirst non-Java implementation — complete, tested, from first principles
35k+Words in the research paper and tutorial
O(1)Being integrated into ReponoDB's query engine for disk I/O optimization

Read the full paper

A comprehensive write-up covering the theory, implementation details, benchmarks, and design decisions behind the Aleph Filter.

Note — this paper is a living document and may be subject to changes with future revisions and feedback.

Capabilities at a glance

FeatureBloomCuckooXor8Aleph
InsertYesYesBatch onlyYes
QueryO(k)O(1)O(1)O(1)*
DeleteYesYes
ExpandYes
Stable FPRYesYes

* amortized

Throughput numbers

Criterion benchmarks (100 samples) on Apple M2, Rust 1.93.0 --release. All filters configured for 1% target FPR. Lower ns/op is better.

NAlephBloomCuckooXor8*
1,0006.319.2191.77.7
10,00071.9193.7303.785.5
100,0001,5111,9743,8141,485

* Xor8 = batch build (immutable). Per-element amortized.

NAlephBloomCuckooXor8
1,0003.810.316.91.7
10,00044.8150.4170.617.0
100,0002,1002,6731,763184
NAlephCuckoo
1,0009.133.5
10,000101.6287.6

Bloom and Xor8 do not support deletion.

NAlephBloomCuckooXor8
1,0000.391.042.980.42
10,0000.430.991.980.39
100,0000.621.002.470.41

01

Fastest insertion

1.3–3× faster than Bloom and 4–30× faster than Cuckoo. Single xxHash3 call and quotient-filter locality deliver 66–159 Melem/s.

02

Strongest negative lookup

222–265 Melem/s for N ≤ 10,000, beating both Bloom (66–98 Melem/s) and Cuckoo (58–59 Melem/s) via early occupied-flag rejection.

03

Best FPR accuracy

Measured 0.39–0.62% against a 1% target — well within budget. Cuckoo overshoots at 2–3%. Bloom lands within ±0.04% of target.

04

Fastest deletion

~3× faster than Cuckoo (the only other dynamic filter supporting delete). Cluster-aware deletion requires only a local shift within the run.

05

Unique expansion

The only filter supporting automatic, unbounded expansion. 11 expansions from 64 to 262,144 slots at 5.1 Melem/s amortized throughput.

From 64 slots to 262,144

50,000 insertions starting from a 64-slot filter. The filter automatically expanded 11 times, doubling capacity each time. No other filter in this comparison supports this.

195ns/op

Amortized insert with expansion

11

Total expansions triggered

5.1M/s

Effective throughput

01SIMD acceleration
02Concurrency
03Compact slots
04Persistent storage
05Variable fingerprints