Approximate Nearest Neighbor (ANN)

From
Jump to: navigation, search

YouTube ... Quora ... Google search ... Google News ... Bing News

Approximate Nearest Neighbor (ANN) is a technique used to efficiently find the closest points in high-dimensional spaces. It is often used in similarity search, where the goal is to find the most similar items to a given query item. ANN algorithms speed up the search by preprocessing the data into an efficient index and are often tackled using phases such as tree-based data structures, neighborhood graphs, hashing methods, and quantization.

The main advantage of ANN is that it can significantly reduce the retrieval cost, given that the query only needs to access a small fraction of documents guided by the index. However, the accuracy of ANN is not always guaranteed, and it depends on the quality of the index and the similarity measure used. In some cases, an approximate nearest neighbor is almost as good as the exact one, especially if the distance measure accurately captures the notion of user quality.

ANN algorithms are allowed to return points whose distance from the query is at most times the distance from the query to its nearest points. Proximity graph methods such as HNSW (Hierarchical Navigable Small World) are considered the current state-of-the-art for the approximate nearest neighbors search. Other ANN techniques include KD-trees, locality-sensitive hashing (LSH), and Product Quantization (PQ).

ANN is widely used in various applications, such as image and video retrieval, recommendation systems, and Natural Language Processing (NLP).