Primers • Approximate Nearest Neighbors  Similarity Search
 What is Similarity Search?
 Hierarchical Navigable Small Worlds (HNSW)
 Approximate Nearest Neighbors
 FAISS
 ScaNN
 ANNOY
 Fast Inference for Graphbased Approximate Nearest Neighbor Search (FINGER) Graph based
 Comparitive Analysis
 ANNBenchmarks
 References
 Citation
What is Similarity Search?
 In similarity searching, there is often a query record that is compared against a stored database of records (documents or images etc). The main aim is to retrieve a set of database records that are similar to the query record.
 Similarity search works with semantic representations of data and finds the similar items fast. This means when we represent images or pieces of text as vector embeddings, their semantic similarity is represented by how close their vectors are in the vector space. Hence, what we want to look at is the distance between vectors of the objects.
 When it comes to choosing the best ANN option for your use case, you can take a look at the leaderboard here.
 The following article has been written by Arpita Vats.
Applications
 Image retrieval, document retrieval, recommender systems, image recognition, similar movies/items etc.
Hierarchical Navigable Small Worlds (HNSW)
 HNSW is a data structure and algorithm used for approximate nearest neighbor search. It was introduced by Malkov and Yashunin in 2016 as an improvement over the traditional kd tree and other methods for efficient nearest neighbor search in highdimensional spaces.
 HNSW is based on the concept of constructing a hierarchical graph where each node represents a point in the dataset.
 The graph is built in a way that allows for efficient navigation through the data to find approximate nearest neighbors. The key idea is to maintain connections between nodes in such a way that the graph exhibits smallworld properties, meaning that even in highdimensional spaces, the average distance between nodes remains relatively small.
Approximate Nearest Neighbors
 Approximate Nearest Neighbors (ANN)based search techniques like Hierarchical Navigable Small Worlds (HNSW) are commonly deployed to carry out similarity search.
 Approximate Nearest Neighbor (ANN) methods are commonly used on top of collaborative filtering or contentbased filtering for candidate generation due to the following reasons:
 Scalability: Collaborative filtering and contentbased filtering techniques often involve computing similarity scores between users/items, which can become computationally expensive as the number of users/items grows. ANN methods provide efficient algorithms for searching and retrieving nearest neighbors, allowing for faster processing of largescale datasets. ANN enables the system to handle millions or even billions of users/items efficiently.

Realtime Recommendations: Recommender systems often need to generate recommendations in realtime, responding quickly to user requests. ANN methods provide fast retrieval of nearest neighbors, allowing for efficient candidate generation on the fly. This is especially important in scenarios where low latency is crucial, such as online platforms or streaming services.

Diversity and Serendipity: Collaborative filtering and contentbased filtering techniques have a tendency to recommend items that are similar to the user’s past preferences or content attributes. This can lead to a lack of diversity in recommendations. By incorporating ANN methods, the system can introduce diversity by identifying items that are slightly different or less similar but still relevant to the user’s preferences. ANN helps strike a balance between personalized recommendations and the exploration of new and diverse items.

Cold Start Problem: Collaborative filtering and contentbased filtering can face challenges in scenarios where there is limited or no historical data for new users or items (cold start problem). ANN methods can be useful in these situations by leveraging the available features or metadata of new users/items to identify similar existing users/items. This allows for the generation of initial recommendations even in the absence of extensive useritem interactions.
By utilizing ANN methods on top of collaborative filtering or contentbased filtering, recommender systems can enhance candidate generation by efficiently searching for nearest neighbors, improving scalability, realtime recommendation capabilities, diversity, and handling cold start scenarios. It helps in delivering more accurate and effective recommendations to users.
FAISS
 Facebook AI Similarity Search (FAISS) is a library developed by Facebook AI that enables efficient similarity search.
 Given a set of vectors, we can index them using FAISS then using another vector (the query vector), we search for the most similar vectors within the index. FAISS not only allows us to build an index and search but it also speeds up search times to ludicrous performance levels. FAISS provides several similar search methods that span a broad spectrum of usage tradeoffs.
 FAISS is optimized for memory usage and speed.
 FAISS offers a stateoftheart GPU implementation for the most relevant indexing methods.
 To know more about the different and efficient indexing done by FAISS, please refer Efficient Indexing of BillionScale datasets of deep descriptors by Babenko and Lempitsky (2016).
 The figure below shows the internal mechanism of FAISS.
Evaluating similarity search
 Once the vectors are extracted by learning machinery (from images, videos, text documents, and elsewhere), they’re ready to feed into the similarity search library.
 Similarity search can be made orders of magnitude faster if we’re willing to trade some accuracy; that is, deviate a bit from the reference result. For example, it may not matter much if the first and second results of an image similarity search are swapped, since they’re probably both correct results for a given query. Accelerating the search involves some preprocessing of the data set, an operation that we call indexing.
 This brings us to the three metrics of interest:
 Speed: How long does it take to find the 10 (or some other number) most similar vectors to the query? Hopefully less time than the bruteforce algorithm needs; otherwise, what’s the point of indexing?
 Memory usage: How much RAM does the method require? More or less than the original vectors? FAISS supports searching only from RAM, as disk databases are orders of magnitude slower. Yes, even with SSDs.
 Accuracy: How well does the returned list of results match the bruteforce search results? Accuracy can be evaluated by counting the number of queries for which the true nearest neighbor is returned first in the result list (a measure called 1recall@1), or by measuring the average fraction of 10 nearest neighbors that are returned in the 10 first results (the “10intersection” measure).
Pros
 FAISS is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.
 FAISS optimizes through various indexing techniques.
Cons
 Similarity search can be made orders of magnitude faster if we’re willing to trade some accuracy; that is, deviate a bit from the reference result.
ScaNN
 ScaNN (Scalable Nearest Neighbors), by Google, is a method for efficient vector similarity search at scale.
 Modern ML models can transform inputs such as text and images into embeddings, high dimensional vectors trained such that more similar inputs cluster closer together.
 For a given query, we can therefore compute its embedding, and find the literary works whose embeddings are closest to the query’s. computational challenge remains: for a given query embedding, how does one quickly find the nearest dataset embeddings? The set of embeddings is often too large for exhaustive search and its high dimensionality makes pruning difficult.
The Importance of Vector Similarity Search
 Embeddingbased search is a technique that is effective at answering queries that rely on semantic understanding rather than simple indexable properties.
 In this technique, machine learning models are trained to map the queries and database items to a common vector embedding space, such that the distance between embeddings carries semantic meaning, i.e., similar items are closer together.
 To answer a query with this approach, the system must first map the query to the embedding space. It then must find, among all database embeddings, the ones closest to the query; this is the nearest neighbor search problem.
 One of the most common ways to define the querydatabase embedding similarity is by their inner product; this type of nearest neighbor search is known as maximum innerproduct search (MIPS).
 Referring to the figure below source: Announcing ScaNN: Efficient Vector Similarity Search, the twotower neural network model, illustrated below, is a specific type of embeddingbased search where queries and database items are mapped to the embedding space by two respective neural networks. In this example the model responds to naturallanguage queries for a hypothetical literary database.
Anisotropic Vector Quantization in ScaNN
 Anisotropic Vector Quantization (AVQ) is an indexing technique used for efficient similarity search. AVQ is designed to handle highdimensional vector embeddings and improve the search performance in scenarios where the data distribution is anisotropic, meaning that the density of vectors is not uniform across the embedding space.
 The AVQ approach begins by dividing the embedding space into a gridlike structure called a quantization grid. Each cell in the grid represents a region in the embedding space. During the indexing phase, AVQ assigns each vector to the cell that is closest to it based on Euclidean distance. This process quantizes the vectors and maps them to discrete grid cells.
 To handle anisotropic data distributions, AVQ introduces an additional step called anisotropic quantization. This step adjusts the shape and size of the quantization grid cells to better align with the underlying data distribution. By adapting the cell shape and size to the data, AVQ can achieve more balanced and accurate indexing.
 During the search phase, given a query vector, AVQ identifies the corresponding quantization grid cell and retrieves the vectors stored in that cell as potential nearest neighbors. These candidate vectors are then further refined using distance calculations to determine the actual nearest neighbors.
 The AVQ technique in ScaNN helps improve search efficiency by reducing the number of vectors that need to be compared for similarity. By grouping similar vectors into the same grid cells, AVQ enables fast retrieval of potential nearest neighbors while maintaining accuracy. This is particularly beneficial in scenarios with highdimensional data and nonuniform data distributions, where traditional indexing methods may struggle to provide efficient search results.
 Anisotropic vector quantization allows ScaNN to better estimate inner products that are likely to be in the topk MIPS results and therefore achieve higher accuracy.
 On the glove100angular benchmark from annbenchmarks.com, ScaNN outperformed eleven other carefully tuned vector similarity search libraries, handling roughly twice as many queries per second for a given accuracy as the nextfastest library (NGTonng).
 ScaNN is opensource software and you can try it yourself at GitHub. The library can be directly installed via Pip and has interfaces for both TensorFlow and Numpy inputs.
Pros
 Handles roughly twice as many queries per second for a given accuracy as the nextfastest library.
 Anisotropic vector quantization allows ScaNN to better estimate inner products that are likely to be in the topk MIPS results and therefore achieve higher accuracy.
Cons
 Database size can easily be in the millions or even billions, MIPS is often the computational bottleneck to inference speed, and exhaustive search is impractical. This necessitates the use of approximate MIPS algorithms that exchange some accuracy for a significant speedup over bruteforce search.
ANNOY
 Annoy from Spotify uses a bunch of trees to enable Spotify’s music recommendations.
 In order to construct the index we create a forest (i.e., a bunch of trees). Each tree is constructed by picking two points at random and splitting the space into two by their hyperplane. We keep recursively splitting into subspaces until at most K items are left in each node.
 It has the ability to use static files as indexes. In particular, this means you can share index across processes.
 Annoy also decouples creating indexes from loading them, so you can pass around indexes as files and map them into memory quickly.
 It tries to minimize memory footprint so the indexes are quite small.
 To understand the data structures and algorithms that Annoy uses to do approximate nearest neighbor queries, please refer Erik Bernhardsson’s Nearest neighbors and vector models – part 2 – algorithms and data structures.
 There are just two main parameters neede to tune Annoy: the number of trees
n_trees
and the number of nodes to inspect during searchingsearch_k
.n_trees
is provided during build time and affects the build time and the index size. A larger value will give more accurate results, but larger indexes.search_k
is provided in runtime and affects the search performance. A larger value will give more accurate results, but will take longer time to return. If
search_k
is not provided, it will default ton * n_trees
where n is the number of approximate nearest neighbors. Otherwise,search_k
andn_trees
are roughly independent, i.e. the value ofn_trees
will not affect search time if search_k is held constant and vice versa.  It’s recommended to set
n_trees
as large as possible given the amount of memory you can afford, and it’s recommended to setsearch_k
as large as possible given the time constraints you have for the queries.
 From Erik Bernhardsson’s Nearest neighbors and vector models – part 2 – algorithms and data structures, the diagram below shows the binary tree (left) when the subspaces have been recursively split such that there are at most \(K\) items left in each node (right).
 In order to search the constructed index, the forest is traversed in order to obtain a set of candidate points from which the closest to the query point is returned.
Pros
 Decouple index creation from loading them, so you can pass around indexes as files and map them into memory quickly.
 We can tune the parameters to change the accuracy/speed tradeoff.
 It has the ability to use static files as indexes, this means you can share indexes across processes.
Cons
 The exact nearest neighbor might be across the boundary to one of the neighboring cells.
 No support for GPU processing.
 No support for batch processing, so in order to increase throughput “further hacking is required”.
 Cant incrementally add points to it (annoy2 tries to fix this).
Fast Inference for Graphbased Approximate Nearest Neighbor Search (FINGER) Graph based
 Graphbased approximation is one such technique where the data points are organized into a graph, and the search algorithm traverses the graph to find the nearest neighbors of the query. The algorithm maintains a list of candidate solutions and regularly updates it as it encounters points closer to the query.
 The FINGER technique focuses on optimizing the search process itself and can be applied to any graph construction method. It takes advantage of the observation that when calculating the distance between the query and points that are farther away than any of the candidates currently on the list, an approximate distance measure is usually sufficient. By efficiently computing the approximate distance using certain manipulations of node vectors’ values, the time required for approximate nearest neighbor search can be reduced by 20% to 60%.
 The approach of FINGER involves representing the query vector and the node vectors as sums of projections along a previously explored node and residual vectors orthogonal to it. By estimating the angle between the residual vectors of the explored node’s immediate neighbors, the angle between the residual vector of a neighbor node and the query’s residual vector can be approximated. This approximation provides an estimation of the distance between the query and the neighbor node.
 Experimental results show that FINGER outperforms three prior graphbased approximation methods on different datasets, achieving more efficient search with higher recall rates. The improvement in efficiency ranges from 50% to almost 88% compared to the previous methods.
Comparitive Analysis
Method  Definition/Functioning  Pros  Cons 

HNSW  Graphbased method building a layered graph for efficient search in highdimensional space. 


FAISS  Library by Facebook AI for efficient similarity search, uses quantization and indexing. 


ScANN  Google's method for vector similarity search combining quantization and tree search. 


ANNOY  Library for Approximate Nearest Neighbors using treebased search. 


ANNBenchmarks
 ANNBenchmarks is a benchmarking repository for approximate nearest neighbor algorithms (such as FAISS, ScaNN, ANNOY, PGVector, etc.) search that spans various commonplace datasets, distance measures, and algorithms.
References
 Jegou, Hervé, et al., FAISS: A library for efficient similarity search, Engineering at Meta, 29 March 2017.
 Erik Bernhardsson, Efficient Indexing of BillionScale datasets of deep descriptors, 1 October 2015.
 Announcing ScaNN: Efficient Vector Similarity Search
 Nearest neighbors and vector models – part 2 – algorithms and data structures
 Moreefficient approximate nearestneighbor search
Citation
If you found our work useful, please cite it as:
@article{VatsChadha2020DistilledSimilaritySearch,
title = {Similarity Search},
author = {Vats, Arpita and Chadha, Aman},
journal = {Distilled AI},
year = {2022},
note = {\url{https://aman.ai}}
}