Vector Quantization

MongoDB Vector Search supports automatic quantization of your float vector embeddings (both 32-bit and 64-bit). It also supports ingesting and indexing your pre-quantized scalar and binary vectors from certain embedding models.

About Quantization

Quantization is the process of shrinking full-fidelity vectors into fewer bits. It reduces the amount of main memory required to store each vector in a MongoDB Vector Search index by indexing the reduced representation vectors instead. This allows for storage of more vectors or vectors with higher dimensions. Therefore, quantization reduces resource consumption and improves speed. We recommend quantization for applications with a large number of vectors, such as over 100,000.

Scalar Quantization

Scalar quantization involves first identifying the minimum and maximum values for each dimension of the indexed vectors to establish a range of values for a dimension. Then, the range is divided into equally sized intervals or bins. Finally, each float value is mapped to a bin to convert the continuous float values into discrete integers. In MongoDB Vector Search, this quantization reduces the vector embedding's RAM cost to about one fourth (1/3.75) of the pre-quantization cost.

Binary Quantization

Binary quantization involves assuming a midpoint of 0 for each dimension, which is typically appropriate for embeddings normalized to length 1 such as OpenAI's text-embedding-3-large. Then, each value in the vector is compared to the midpoint and assigned a binary value of 1 if it's greater than the midpoint and a binary value of 0 if it's less than or equal to the midpoint. In MongoDB Vector Search, this quantization reduces the vector embedding's RAM cost to one twenty-fourth (1/24) of the pre-quantization cost. The reason it's not 1/32 is because the data structure containing the Hierarchical Navigable Small Worlds graph itself, separate from the vector values, isn't compressed.

When you run a query, MongoDB Vector Search converts the float value in the query vector into a binary vector using the same midpoint for efficient comparison between the query vector and indexed binary vectors. It then rescores by reevaluating the identified candidates in the binary comparison using the original float values associated with those results from the binary index to further refine the results. The full fidelity vectors are stored in their own data structure on disk, and are only referenced during rescoring when you configure binary quantization or when you perform exact search against either binary or scalar quantized vectors.

Tip

What is vector quantization?

Requirements

The following table shows the requirements for automatically quantizing and ingesting quantized vectors.

Note

Atlas stores all floating-point values as the double data type internally; therefore, both 32-bit and 64-bit embeddings are compatible with automatic quantization without conversion.

Requirement	For `int1` Ingestion	For `int8` Ingestion	For Automatic Scalar Quantization	For Automatic Binary Quantization
Requires index definition settings	No	No	Yes	Yes
Requires BSON `binData` format	Yes	Yes	No	No
Storage on mongod	`binData(int1)`	`binData(int8)`	`binData(float32)` `array(double)`	`binData(float32)` `array(double)`
Supported Similarity method	`euclidean`	`cosine` `euclidean` `dotProduct`	`cosine` `euclidean` `dotProduct`	`cosine` `euclidean` `dotProduct`
Supported Number of Dimensions	Multiple of 8	1 to 8192	1 to 8192	1 to 8192
Supports ANN and ENN Search	Yes	Yes	Yes	Yes

How to Enable Automatic Quantization of Vectors

You can configure MongoDB Vector Search to automatically quantize float vector embeddings in your collection to reduced representation types, such as int8 (scalar) and binary in your vector indexes.

To set or change the quantization type, specify a quantization field value of either scalar or binary in your index definition. This triggers an index rebuild similar to any other index definition change. The specified quantization type applies to all indexed vectors and query vectors at query-time. You don't need to change your query as your query vectors are automatically quantized.

For most embedding models, we recommend binary quantization with rescoring. If you want to use lower dimension models that are not QAT, use scalar quantization because it has less representational loss and therefore, incurs less representational capacity loss.

Benefits

MongoDB Vector Search provides native capabilities for scalar quantization as well as binary quantization with rescoring. Automatic quantization increases scalability and cost savings for your applications by reducing the computational resources for efficient processing of your vectors. Automatic quantization reduces the RAM for mongot by 3.75x for scalar and by 24x for binary; the vector values shrink by 4x and 32x respectively, but Hierarchical Navigable Small Worlds graph itself does not shrink. This improves performance, even at the highest volume and scale.

Use Cases

We recommend automatic quantization if you have large number of full fidelity vectors, typically over 100,000 vectors. After quantization, you index reduced representation vectors without compromising the accuracy when retrieving vectors.

Procedure

To enable automatic quantization:

Specify the type of quantization you want in your MongoDB Vector Search index.

In a new or existing MongoDB Vector Search index, specify one of the following quantization types in the fields.quantization field for your index definition:

scalar: to produce byte vectors from float input vectors.
binary: to produce bit vectors from float input vectors.

If you specify automatic quantization on data that is not an array of float values, MongoDB Vector Search silently ignores that vector instead of indexing it, and those vectors will be skipped. Since Atlas stores float values (both 32-bit and 64-bit) as the double type internally, embeddings from models that output either precision will work with automatic quantization.

Create or update the index.

The index should take about one minute to build. While it builds, the index is in an initial sync state. When it finishes building, you can start querying the data in your collection.

The specified quantization type applies to all indexed vectors and query vectors at query-time.

Considerations

When you view your quantized index in the Atlas UI, the index size might appear larger than an index without quantization. This is because the Size metric represents the total data stored, which includes the Hierarchical Navigable Small Worlds graph (in memory), the quantized vectors (in memory), and the full-fidelity vectors (on disk). To estimate the amount of memory used by the index at query-time, refer to the Required Memory metric.

How to Ingest Pre-Quantized Vectors

MongoDB Vector Search also supports ingestion and indexing of scalar and binary quantized vectors from certain embedding models. If you don't already have quantized vectors, you can convert your embeddings to BSON BinData vectors with float32, int1, or int8 subtype.

Use Cases

We recommend ingesting quantized BSON binData vectors for the following use cases:

You need to index quantized vector output from embedding models.
You have a large number of float vectors and want to reduce the storage and WiredTiger footprint (such as disk and memory usage) in mongod.

Benefits

BinData is a BSON data type that stores binary data. It compresses your vector embeddings and requires about three times less disk space in your cluster compared to embeddings that use a standard float32 array. To learn more, see Vector Compression.

This subtype also allows you to index your vectors with alternate types such as int1 or int8 vectors, reducing the memory needed to build the MongoDB Vector Search index for your collection. It reduces the RAM for mongot by 3.75x for scalar and by 24x for binary; the vector values shrink by 4x and 32x respectively, but the Hierarchical Navigable Small Worlds graph itself doesn't shrink.

If you don't already have binData vectors, you can convert your embeddings to this format by using any supported driver before writing your data to a collection. The following procedure walks you through the steps for converting your embeddings to the BinData vectors with float32, int8, and int1 subtypes.

Supported Drivers

BSON BinData vectors with float32, int1, and int8 subtypes is supported by the following drivers:

C++ Driver v4.1.0 or later
C#/.NET Driver v3.2.0 or later
Go Driver v2.1.0 or later
PyMongo Driver v4.10 or later
Node.js Driver v6.11 or later
Java Driver v5.3.1 or later

Prerequisites

The examples in this procedure use either new data or existing data and embeddings generated by using Voyage AI's voyage-3-large model. The example for new data uses sample text strings, which you can replace with your own data. The example for existing data uses a subset of documents without any embeddings from the listingsAndReviews collection in the sample_airbnb database, which you can replace with your own database and collection (with or without any embeddings).

Select whether you want to quantize binData vectors for new data or for data you already have in your cluster using the Data Source dropdown menu below. Select your preferred programming language as well.

Language

Data Source

Evaluate Your Query Results

You can measure the accuracy of your MongoDB Vector Search query by evaluating how closely the results for an ANN search match the results of an ENN search against your quantized vectors. That is, you can compare the results of ANN search with the results of ENN search for the same query criteria and measure how frequently the ANN search results include the nearest neighbors in the results from the ENN search.

For a demonstration of evaluating your query results, see How to Measure the Accuracy of Your Query Results.

Back

Review Deployment Options

Automatic Quantization with Voyage AI