Join us Sept 17 at .local NYC! Use code WEB50 to save 50% on tickets. Learn more >
MongoDB Jokes
Docs Menu
Docs Home
/ /

MongoDB Vector Search Benchmark Overview

This page explains key performance optimization strategies for MongoDB Vector Search and how we used them to create our benchmark. To learn how to interpret this guide, see How to Use This Benchmark.

Information about the dataset and set-up for our benchmark.

For our benchmark, we used the Amazon Reviews 2023 dataset, a massive e-commerce dataset containing 571.54M reviews across 33 product categories, representing interactions from May 1996 to September 2023. With approximately 48.2 million unique items covered by these reviews, it provides rich multimodal data including user reviews (ratings, text, helpfulness votes), item metadata (descriptions, price, images), and user-item interaction graphs. We looked at subsets of the item dataset (5.5M and 15.3M) that contain titles and descriptions, and used Voyage AI's voyage-3-large model to embed them using the following logic:

source = "Item: Title: " + record["title"] + " Description: " + record["description"]
source_embedding = vo.embed(source, model="voyage-3-large", input_type="document", output_dimension=2048)

Result quality for filters is determined by computing the Jaccard similarity (intersection / expected number of results) using the results from an ANN query and the corresponding float ENN exact results for the same text input and number of vectors requested. Recall is computed by finding the average intersection across 50 sample queries which might be asked of an e-commerce dataset.

Note

To see the source code used for benchmarking, as well as the code used to embed the source dataset, see Performance Testing Repository.

This section outlines several factors that impact performance for MongoDB Vector Search and how we configured our benchmark to test them.

Quantization reduces the precision of vector embeddings to decrease memory usage and improve search speed, with trade-offs in search accuracy.

Scalar quantization converts 32-bit floating-point vectors to 8-bit integers, achieving a 4x reduction in memory usage. Integer vector comparisons take less computation time compared to float vectors and require fewer resources, but may incur a penalty in the search precision.

Binary quantization converts vectors to 1-bit representations, achieving a 32x reduction in memory usage. Binary vector comparisons involve computing the Hamming distance and take even less computation time compared to int vectors and fewer resources. However, the penalty in search precision is so significant going from float vectors to binary vectors that to account for this, we add a rescoring step, which increases latency. At query time, the top numCandidates that are accumulated during search are reordered by their full fidelity vectors on disk before yielding the top limit results.

We used Voyage AI's voyage-3-large model to embed the medium (5.5M) and large (15.3M) vectors datasets. We chose this embedding model because of its outperformance on many IR benchmarks and because it is trained with both Matryoshka Representation Learning and quantization in mind. Therefore, it performs well at lower dimensions with quantization enabled, even at higher volumes of vectors.

We leveraged indexing on views to produce additional fields that slice the first N dimensions of the source 2048 dimension vector to produce 1024, 512, and 256 dimension vectors and index them as we would the source field.

Note

You must use MongoDB version 8.1 or later in order to create a vector search index on a view.

Similar to different representations at each position, the different dimensionalities impact the representational capacity of each vector. Consequently, you can achieve higher accuracy with 2048d vectors compared to 256d vectors, especially when you measure against a 2048d float ENN baseline.

In addition to requiring more storage and memory, higher dimensional vectors are somewhat slower to query compared to lower dimensional vectors, but this is mitigated significantly as MongoDB Vector Search leverages SIMD instructions when performing vector comparisons.

We also created a separate index definition on the collection containing all 15.3M items, which includes filters on two fields to enable pre-filtered queries against this dataset.

We ran vector search queries, both unfiltered and filtered, against the large indexed dataset:

# unfiltered query
query = [
{
"$vectorSearch": {
"index": "large_vector_index",
"path": "embedding",
"queryVector": embedding.tolist(),
"limit": k,
"numCandidates": candidates,
}
},
{
"$project": {"embedding": 0}
}
]
# filtered query
query = [
{
"$vectorSearch": {
"index": "large_vector_index",
"path": "embedding",
"queryVector": embedding.tolist(),
"limit": k,
"numCandidates": candidates,
"filter": {"$and": [{'price': {'$lte': 1000}}, {'category': {'$eq': "Pet Supplies"}}]}
}
},
{
"$project": {"embedding": 0}
}
]

Note

Both query patterns exclude the embedding fields in the output by using $project stage. This is always recommended to reduce latency unless you need embeddings in your results.

MongoDB Vector Search performance scales with dedicated Search Nodes, which handle vector computations separately from your primary database workload and make efficient use of dedicated hardware instances. All tests were conducted using an M20 base cluster, but depending on the type of test, we reconfigured the Search Nodes used to better fit our test case. All tests were run using Search Nodes on AWS us-east-1, with an EC2 instance also in us-east-1 making requests. There are three types of Search Nodes that you can provision on AWS, which vary in terms of disk, RAM, and vCPUs that they have available:

Node Type
Resource Profile
Recommended Usage

Low-CPU

Low disk to memory ratio (~6:1), low vCPU

Good starting point for many vector workloads that don't leverage quantization

High-CPU

High disk to memory ratio (~25:1), high vCPU

Performant choice for high QPS workloads or workloads that leverage quantization

Storage-Optimized

High disk to memory ratio (~25:1), low vCPU

Cost-effective choice for workloads that leverage quantization

A 768-dimension float vector occupies ~3kb of space on disk. This resource requirement scales linearly with the number of vectors and the number of dimensions of each vector: 1M 768d vectors occupies ~3GB; 1M 1536d occupies ~6gb.

Using quantization, we produce representation vectors that are held in memory from the full fidelity vectors stored on disk. This reduces the amount of required memory by 3.75x for scalar quantization and 24x for binary quantization, but increases the amount of disk needed to store the unquantized and quantized vectors.

1 scalar quantized 768d vector requires 0.8kb of memory (3/3.75) and ~3.8kb of disk (3 + 3/3.75). Considering these hardware options and the resource requirements for quantization, we selected the following search node tiers for the different test cases:

Test Case
Resources Required (RAM, Storage)
Search Node Tier RAM, disk, vCPUs
Price for 2x Nodes

Medium dataset (5.5M vectors, all dimensions), scalar quantization

22, 104.5 GB

S50-storage-optimized 32 GB, 843 GB, 4 vCPUs

$1.04/hr

Medium dataset (5.5M vectors, all dimensions), binary quantization

3.43, 104.5 GB

S30-high-cpu 8 GB 213 GB 4 vCPUs

$0.24/hr

Large dataset (15.3M vectors, 2048d), scalar quantization

32.64, 155.04 GB

S50-storage-optimized 32 GB, 843 GB, 4 vCPUs

$1.04/hr

Large dataset (15.3M vectors, 2048d), binary quantization

5.1, 155.04 GB

S30-high-cpu 8 GB 213 GB 4 vCPUs

$0.24/hr

For the large dataset, we leveraged an additional feature called vector compression, which reduces the footprint of each vector in the source collection by about 60%. This accelerates the step within a query when IDs are hydrated in the source collection, and this is a recommended step for all large workloads.

We assessed not only serial query latency, but also total throughput/QPS when 10 and 100 requests are issued concurrently.

Note

The recommended mechanism for handling higher throughput is scaling out the number of Search Nodes horizontally, which we did not measure in these tests.

We assessed the impact of sharding our cluster and collection on the _id field on the system's throughput , focusing on request concurrency of 10 and 100 for the large binary quantized index.

Back

Performance Benchmark

On this page