Advanced

HNSW

Keywords

HNSWapproximate nearest neighborANNnavigable small worldskip listgreedy routingefConstructionrecallhigh-dimensional vectorAsk ChatGPT

Prerequisites

None — this is a starting concept.

Progress

當我們需要在數百萬甚至數十億筆高維向量中找到與 query 最相似的結果時，精確的 nearest neighbor search 計算成本過高，因此我們需要 approximate nearest neighbor（ANN）演算法。HNSW 是目前最廣泛使用的 ANN 索引之一，它建構了一個多層的 navigable small world graph，結構類似 skip list：上層是稀疏的 long-range connection 用於快速跳躍，下層是密集的 short-range connection 用於精確搜尋。查詢時我們從最上層開始以 greedy routing 逐層下降，最終在底層找到近似最近鄰。我們會討論建構時 efConstruction 參數如何控制 graph 品質與建構速度的平衡，以及這個索引結構在向量資料庫中的實際應用場景。

Key Concepts

我理解 HNSW 的多層 graph 結構類似 skip list：上層節點稀疏、擁有 long-range connection 用於快速跳躍，下層節點密集、擁有 short-range connection 用於精確搜尋

我理解 HNSW 的查詢過程：從最上層的 entry point 開始，以 greedy routing 在每一層找到局部最近鄰，然後下降到下一層繼續搜尋，直到底層得到最終結果

我理解 efConstruction 參數如何控制建構品質與速度的權衡：較大的 efConstruction 產生更高品質的 graph 但建構更慢，較小的值則相反

我了解為什麼在高維空間中精確的 nearest neighbor search 不可行：維度災難使得精確搜尋的計算成本隨維度指數增長，因此需要 approximate 方法

我知道 HNSW 在向量資料庫中的應用場景，例如語意搜尋、推薦系統、圖片檢索等需要 high-dimensional vector similarity search 的任務

HNSW

Keywords

Prerequisites

Related Papers

Progress

Key Concepts

Recommended Resources

Test Your Understanding