MongoDB Vector Search supports automatic quantization of your float vector embeddings (both 32-bit and 64-bit). It also supports ingesting and indexing your pre-quantized scalar and binary vectors from certain embedding models.
About Quantization
Quantization is the process of shrinking full-fidelity vectors into fewer bits. It reduces the amount of main memory required to store each vector in a MongoDB Vector Search index by indexing the reduced representation vectors instead. This allows for storage of more vectors or vectors with higher dimensions. Therefore, quantization reduces resource consumption and improves speed. We recommend quantization for applications with a large number of vectors, such as over 100,000.
Scalar Quantization
Scalar quantization involves first
identifying the minimum and maximum values for each dimension of the
indexed vectors to establish a range of values for a dimension. Then,
the range is divided into equally sized intervals or bins. Finally, each
float value is mapped to a bin to convert the continuous float values
into discrete integers. In MongoDB Vector Search, this quantization reduces the vector
embedding's RAM cost to about one fourth (1/3.75) of the
pre-quantization cost.
Binary Quantization
Binary quantization involves assuming a
midpoint of 0 for each dimension, which is typically appropriate for
embeddings normalized to length 1 such as OpenAI's
text-embedding-3-large. Then, each value in the vector is
compared to the midpoint and assigned a binary value of 1 if it's
greater than the midpoint and a binary value of 0 if it's less than
or equal to the midpoint. In MongoDB Vector Search, this quantization reduces the
vector embedding's RAM cost to one twenty-fourth (1/24) of the
pre-quantization cost. The reason it's not 1/32 is because the data
structure containing the Hierarchical Navigable Small Worlds graph itself, separate from the vector
values, isn't compressed.
When you run a query, MongoDB Vector Search converts the float value in the query vector into a binary vector using the same midpoint for efficient comparison between the query vector and indexed binary vectors. It then rescores by reevaluating the identified candidates in the binary comparison using the original float values associated with those results from the binary index to further refine the results. The full fidelity vectors are stored in their own data structure on disk, and are only referenced during rescoring when you configure binary quantization or when you perform exact search against either binary or scalar quantized vectors.
Requirements
The following table shows the requirements for automatically quantizing and ingesting quantized vectors.
Note
Atlas stores all floating-point values as the double data type
internally; therefore, both 32-bit and 64-bit embeddings are compatible
with automatic quantization without conversion.
Requirement | For int1 Ingestion | For int8 Ingestion | For Automatic Scalar Quantization | For Automatic Binary Quantization |
|---|---|---|---|---|
Requires index definition settings | No | No | Yes | Yes |
Requires BSON | Yes | Yes | No | No |
Storage on mongod |
|
| binData(float32)array(double) | binData(float32)array(double) |
Supported Similarity method |
| cosineeuclideandotProduct | cosineeuclideandotProduct | cosineeuclideandotProduct |
Supported Number of Dimensions | Multiple of 8 | 1 to 8192 | 1 to 8192 | 1 to 8192 |
Supports ANN and ENN Search | Yes | Yes | Yes | Yes |
How to Enable Automatic Quantization of Vectors
You can configure MongoDB Vector Search to automatically quantize float
vector embeddings in your collection to reduced representation types,
such as int8 (scalar) and binary in your vector indexes.
To set or change the quantization type, specify a quantization field
value of either scalar or binary in your index definition. This
triggers an index rebuild similar to any other index definition change.
The specified quantization type applies to all indexed vectors and
query vectors at query-time. You don't need to change your query as your
query vectors are automatically quantized.
For most embedding models, we recommend binary quantization with rescoring. If you want to use lower dimension models that are not QAT, use scalar quantization because it has less representational loss and therefore, incurs less representational capacity loss.
Benefits
MongoDB Vector Search provides native capabilities for scalar quantization as well as
binary quantization with rescoring. Automatic quantization increases
scalability and cost savings for your applications by reducing the
computational resources for efficient processing of your
vectors. Automatic quantization reduces the RAM for mongot by 3.75x
for scalar and by 24x for binary; the vector values shrink by 4x and 32x
respectively, but Hierarchical Navigable Small Worlds graph itself does not shrink. This improves
performance, even at the highest volume and scale.
Use Cases
We recommend automatic quantization if you have large number of full fidelity vectors, typically over 100,000 vectors. After quantization, you index reduced representation vectors without compromising the accuracy when retrieving vectors.
Procedure
To enable automatic quantization:
Specify the type of quantization you want in your MongoDB Vector Search index.
In a new or existing MongoDB Vector Search index, specify one of the following
quantization types in the fields.quantization field
for your index definition:
scalar: to produce byte vectors from float input vectors.binary: to produce bit vectors from float input vectors.
If you specify automatic quantization on data that is not an
array of float values, MongoDB Vector Search silently ignores that vector
instead of indexing it, and those vectors will be skipped.
Since Atlas stores float values (both 32-bit and 64-bit)
as the double type internally, embeddings from models
that output either precision will work with automatic quantization.
Create or update the index.
The index should take about one minute to build. While it builds, the index is in an initial sync state. When it finishes building, you can start querying the data in your collection.
The specified quantization type applies to all indexed vectors and query vectors at query-time.
Considerations
When you view your quantized index in the Atlas UI, the index size might appear larger than an index without quantization. This is because the Size metric represents the total data stored, which includes the Hierarchical Navigable Small Worlds graph (in memory), the quantized vectors (in memory), and the full-fidelity vectors (on disk). To estimate the amount of memory used by the index at query-time, refer to the Required Memory metric.
How to Ingest Pre-Quantized Vectors
MongoDB Vector Search also supports ingestion and indexing of scalar and
binary quantized vectors from certain embedding models. If you don't already
have quantized vectors, you can convert your embeddings to BSON
BinData vectors with
float32, int1, or int8 subtype.
Use Cases
We recommend ingesting quantized BSON binData vectors
for the following use cases:
You need to index quantized vector output from embedding models.
You have a large number of float vectors and want to reduce the storage and WiredTiger footprint (such as disk and memory usage) in
mongod.
Benefits
BinData is a BSON data type
that stores binary data. It compresses your vector embeddings and requires
about three times less disk space in your cluster compared to embeddings
that use a standard float32 array. To learn more, see Vector Compression.
This subtype also allows you to index your vectors with
alternate types such as int1 or int8 vectors, reducing the
memory needed to build the MongoDB Vector Search index for your collection. It reduces
the RAM for mongot by 3.75x for scalar and by 24x for binary; the
vector values shrink by 4x and 32x respectively, but the Hierarchical Navigable Small Worlds graph
itself doesn't shrink.
If you don't already have binData vectors, you can convert your
embeddings to this format by using any supported driver before writing
your data to a collection. The following procedure walks you through the steps for
converting your embeddings to the BinData vectors with float32,
int8, and int1 subtypes.
Supported Drivers
BSON BinData vectors with
float32, int1, and int8 subtypes is supported by
the following drivers:
C++ Driver v4.1.0 or later
C#/.NET Driver v3.2.0 or later
Go Driver v2.1.0 or later
PyMongo Driver v4.10 or later
Node.js Driver v6.11 or later
Java Driver v5.3.1 or later
Prerequisites
The examples in this procedure use either new data or existing data and
embeddings generated by using Voyage AI's
voyage-3-large model. The example for new data uses sample text
strings, which you can replace with your own data. The example for
existing data uses a subset of documents without any embeddings from the
listingsAndReviews collection in the sample_airbnb database,
which you can replace with your own database and collection (with or
without any embeddings).
Select whether you want to quantize binData vectors for new data or for data you already have in your cluster using the Data Source dropdown menu below. Select your preferred programming language as well.
Evaluate Your Query Results
You can measure the accuracy of your MongoDB Vector Search query by evaluating how closely the results for an ANN search match the results of an ENN search against your quantized vectors. That is, you can compare the results of ANN search with the results of ENN search for the same query criteria and measure how frequently the ANN search results include the nearest neighbors in the results from the ENN search.
For a demonstration of evaluating your query results, see How to Measure the Accuracy of Your Query Results.