© Research & Implementation
Aleph Filter — To Infinity in Constant Time
A probabilistic data structure that enables O(1) membership queries and expands infinitely without rebuilding. Based on VLDB 2024 research. First non-Java implementation. Benchmarked on Apple M2 with Criterion.
The Problem
Filters that can't grow
Probabilistic filters — Bloom, Cuckoo — are critical for database performance. They answer "is this element possibly in the set?" without touching disk.
But they have a fundamental limitation: they can't expand. When your data grows beyond capacity, you rebuild from scratch. For a database that needs to scale, this is a non-starter.
The Solution
Infinite expansion, O(1) queries
The Aleph Filter uses a clever insight: when fingerprints run out of bits during expansion, duplicate entries to both possible bucket locations instead of using overflow tables.
This preserves O(1) lookup while allowing infinite growth. No rebuilding. No performance degradation. The false positive rate increases predictably and can be managed with compaction.
© How It Works
01
Start with a Cuckoo Filter
Elements are stored as fingerprints in buckets. Each element maps to two possible locations via a hash and an XOR-based alternate.
02
Double the table when full
When the filter fills up, double the number of buckets. Use one extra bit from each fingerprint to determine which half the entry belongs to.
03
Duplicate when bits run out
When a fingerprint has no more bits to split on, place a copy in both possible locations. Queries still check at most 2 buckets — O(1) is preserved.
04
Scale forever
Repeat indefinitely. The false positive rate grows with each expansion but can be controlled with periodic compaction. No rebuilding required.
© My Contribution
From paper to production
© Paper
Read the full paper
A comprehensive write-up covering the theory, implementation details, benchmarks, and design decisions behind the Aleph Filter.
Note — this paper is a living document and may be subject to changes with future revisions and feedback.
© Feature Comparison
Capabilities at a glance
| Feature | Bloom | Cuckoo | Xor8 | Aleph |
|---|---|---|---|---|
| Insert | Yes | Yes | Batch only | Yes |
| Query | O(k) | O(1) | O(1) | O(1)* |
| Delete | — | Yes | — | Yes |
| Expand | — | — | — | Yes |
| Stable FPR | — | — | Yes | Yes |
* amortized
© Benchmarks
Throughput numbers
Criterion benchmarks (100 samples) on Apple M2, Rust 1.93.0 --release. All filters configured for 1% target FPR. Lower ns/op is better.
Insertion (ns/op)
| N | Aleph | Bloom | Cuckoo | Xor8* |
|---|---|---|---|---|
| 1,000 | 6.3 | 19.2 | 191.7 | 7.7 |
| 10,000 | 71.9 | 193.7 | 303.7 | 85.5 |
| 100,000 | 1,511 | 1,974 | 3,814 | 1,485 |
* Xor8 = batch build (immutable). Per-element amortized.
Negative Lookup (ns/op)
| N | Aleph | Bloom | Cuckoo | Xor8 |
|---|---|---|---|---|
| 1,000 | 3.8 | 10.3 | 16.9 | 1.7 |
| 10,000 | 44.8 | 150.4 | 170.6 | 17.0 |
| 100,000 | 2,100 | 2,673 | 1,763 | 184 |
Deletion (ns/op)
| N | Aleph | Cuckoo |
|---|---|---|
| 1,000 | 9.1 | 33.5 |
| 10,000 | 101.6 | 287.6 |
Bloom and Xor8 do not support deletion.
False Positive Rate (%, target = 1.00%)
| N | Aleph | Bloom | Cuckoo | Xor8 |
|---|---|---|---|---|
| 1,000 | 0.39 | 1.04 | 2.98 | 0.42 |
| 10,000 | 0.43 | 0.99 | 1.98 | 0.39 |
| 100,000 | 0.62 | 1.00 | 2.47 | 0.41 |
© Key Findings
01
Fastest insertion
1.3–3× faster than Bloom and 4–30× faster than Cuckoo. Single xxHash3 call and quotient-filter locality deliver 66–159 Melem/s.
02
Strongest negative lookup
222–265 Melem/s for N ≤ 10,000, beating both Bloom (66–98 Melem/s) and Cuckoo (58–59 Melem/s) via early occupied-flag rejection.
03
Best FPR accuracy
Measured 0.39–0.62% against a 1% target — well within budget. Cuckoo overshoots at 2–3%. Bloom lands within ±0.04% of target.
04
Fastest deletion
~3× faster than Cuckoo (the only other dynamic filter supporting delete). Cluster-aware deletion requires only a local shift within the run.
05
Unique expansion
The only filter supporting automatic, unbounded expansion. 11 expansions from 64 to 262,144 slots at 5.1 Melem/s amortized throughput.
© Expansion
From 64 slots to 262,144
50,000 insertions starting from a 64-slot filter. The filter automatically expanded 11 times, doubling capacity each time. No other filter in this comparison supports this.
Amortized insert with expansion
Total expansions triggered
Effective throughput
© Future Work