A Bloom filter is a concise, compressed representation of a set, built for one job: answering membership queries — is this item in the set? It can use far less storage than the original set, at the cost of occasional false positives. It will never produce a false negative: if it says an item is absent, it truly is.
Installation
go get -u github.com/bits-and-blooms/bloom/v3
A Bloom filter is not a dynamic data structure: you must know your desired capacity ahead of time. To build a filter for one million elements with a 1% false-positive rate:
filter := bloom.NewWithEstimates(1000000, 0.01)
Call NewWithEstimates conservatively: if you specify too few elements, the false-positive bound may be exceeded. A lower false-positive rate or a higher capacity both cost more memory.
What it gives you
- Membership queries with zero false negatives and a tunable false-positive rate.
- Automatic sizing from a target capacity and error rate via
NewWithEstimates. Add,Test, and the combinedTestAndAdd/TestOrAddprimitives, plus*Stringvariants.- Set algebra across compatible filters with
MergeandCopy. - Cardinality estimation with
ApproximatedSizeand rate validation withEstimateFalsePositiveRate. - Portable binary serialization plus
json,gob, andBinaryMarshalersupport. - Backed by a compact bitset and the fast, non-cryptographic murmur3 hash.
Example
Keys are []byte. Adding and testing a string item:
package main
import (
"fmt"
"github.com/bits-and-blooms/bloom/v3"
)
func main() {
filter := bloom.NewWithEstimates(1000000, 0.01)
filter.Add([]byte("Love"))
if filter.TestString("Love") {
fmt.Println("Love is (probably) in the set")
}
if !filter.TestString("Hate") {
fmt.Println("Hate is definitely not in the set")
}
}
For numeric data, reach for encoding/binary. For example, to add a uint32:
i := uint32(100)
n1 := make([]byte, 4)
binary.BigEndian.PutUint32(n1, i)
filter.Add(n1)
Verifying the false-positive rate
Sometimes the actual false-positive rate differs slightly from the theoretical one. You can estimate it for a filter with m bits and k hash functions over a set of size n:
if bloom.EstimateFalsePositiveRate(20*n, 5, n) > 0.001 {
// too high — increase m or k
}
Use it to validate the computed m, k parameters:
m, k := bloom.EstimateParameters(n, fp)
actualFPRate := bloom.EstimateFalsePositiveRate(m, k, n)
Note:
EstimateFalsePositiveRatebuilds a temporary filter and is relatively expensive — it is meant for validation, not for hot paths. Estimating the rate clears the filter it operates on.
Serialization
You can write a Bloom filter to a stable byte stream and read it back:
f := bloom.New(1000, 4)
var buf bytes.Buffer
bytesWritten, err := f.WriteTo(&buf)
if err != nil {
// handle error
}
var g bloom.BloomFilter
bytesRead, err := g.ReadFrom(&buf)
if err != nil {
// handle error
}
// bytesRead == bytesWritten
Performance tip: when reading from or writing to a file or network connection, wrap your streams in
bufioinstances for better throughput.
Design
A Bloom filter has two parameters: m, the number of bits used in storage, and k, the number of hash functions. The filter is backed by a bitset; a key is represented by setting the bits at each of the k hashed locations (modulo m). Membership is tested by checking whether all of those bits are set.
If an item is truly in the set, the filter never fails (the true-positive rate is 1.0); it is only susceptible to false positives. The art is to choose k and m correctly. This implementation hashes with murmur3, a fast non-cryptographic hash.
Used by demanding systems
The package is relied on across databases, observability systems, networking tools, blockchains, and infrastructure software.
Databases and storage engines
- milvus-io/milvus — leading open-source vector database, explicitly named in the repo README.
- weaviate/weaviate — vector DB, used in its
lsmkvLSM storage layer. - openGemini/openGemini — time-series database with multiple index packages.
- siglens/siglens — observability and log search engine.
- ankur-anand/unisondb — embedded database memtable.
- jakub-galecki/godb — SSTable layer in an LSM-tree database.
Observability and log processing
- grafana/loki — Grafana's log aggregation system, across its index and storage layers.
Distributed systems and consensus
- atomix/atomix — distributed primitives framework.
- authzed/spicedb — Zanzibar-style permissions system, dispatch layer.
- splitio/go-split-commons — Split.io Go SDK, impression deduplication across major versions.
Networking and security tools
- projectdiscovery/hmap — used across the nuclei/subfinder/httpx scanner ecosystem.
- daeuniverse/dae — Linux eBPF transparent proxy and routing table.
- bloXroute-Labs/gateway — blockchain network gateway.
Blockchain and crypto
- NethermindEth/juno — Starknet full node.
- iost-official/go-iost — IOST blockchain peer-to-peer layer.
- status-im/status-go — Status decentralized messenger.
- code-payments/code-server — crypto payments backend.
Privacy and identity
- optable/match — privacy-preserving record linkage.
- Arceliar/ironwood — anonymous routing network library.
Web crawling and scraping
- beego/beego — web framework, named in the upstream README.
- editorpost/spider — deduplication store for crawling.
Application infrastructure
- bentoml/yatai — model serving infrastructure.
- rotationalio/ensign — event streaming platform.
- puppetlabs/leg — message deduplication middleware.
- letsencrypt/x509search — certificate search tooling.
Goroutine safety
Filters are intentionally unsynchronized for performance. If multiple goroutines need the same filter, provide your own synchronization — typically a sync.Mutex, or channel hand-off so there is only ever one owner. Exceptionally, multiple goroutines may share a filter if none of them ever modify its content.
Further reading
Mastering Programming: From Testing to Performance in Go covers the kind of testing and optimization discipline that libraries like this benefit from.