Basic

Bloom Filter

Keywords

Bloom filterhash functionbit arrayfalse positivefalse negativeprobabilistic data structurespace efficiencyAsk ChatGPT

Prerequisites

None — this is a starting concept.

Progress

在儲存系統中，我們經常需要快速判斷「某筆資料是否存在」，而不想每次都去磁碟上做昂貴的查詢，Bloom filter 就是為此設計的機率型資料結構。它使用 k 個 hash function 將元素映射到一個 bit array 上，查詢時只要檢查對應的 k 個位元是否全為 1。Bloom filter 保證不會有 false negative（存在的東西一定查得到），但允許一定機率的 false positive（不存在的東西可能誤判為存在）。我們會分析 false-positive rate 與 bit array 大小、hash function 數量之間的數學關係，幫助我們在實務中選擇合適的參數。理解 Bloom filter 對後續學習 LSM-Tree 至關重要，因為它被大量用來減少不必要的 disk read。

Key Concepts

我理解 Bloom filter 的結構：使用 k 個 hash function 將元素映射到一個 bit array，插入時設定對應位元為 1，查詢時檢查所有對應位元

我理解 Bloom filter 保證沒有 false negative（存在的元素一定會被判定為存在），但允許 false positive（不存在的元素可能被誤判為存在）

我理解 false positive rate 與 bit array 大小 m、hash function 數量 k、以及已插入元素數量 n 之間的數學關係

我知道如何根據預期的元素數量與可接受的 false positive rate 來選擇合適的 bit array 大小與 hash function 數量

Bloom Filter

Keywords

Prerequisites

Related Papers

Progress

Key Concepts

Recommended Resources

Test Your Understanding