Important
Feature unavailable in Flex Clusters and Serverless Instances
Flex clusters and Serverless instances don't support this feature at this time. To learn more, see Atlas Flex Limitations and Serverless Instance Limitations.
This page describes how to restore archived data using the $merge
pipeline stage.
Note
Ensure that your cluster is adequately provisioned for the amount of data you want to restore from your archive. Otherwise, you risk running out of space during or after the restoration. Contact Support for additional technical guidance on setting up the size of the oplog or for troubleshooting any space issues on your Atlas cluster.
Required Access
You need Project Data Access Admin access or higher to the project to follow
this procedure.
Considerations
This approach is not recommended for large datasets (around 1TB of data) with a large number of partitions.
Procedure
Perform the following steps to restore archived data to your Atlas cluster:
Pause the Online Archive associated with the collection that contains the archived data you want to restore.
See Pause and Resume Archiving for more information.
Connect to Online Archive using your connection string.
You must use the Archive Only connection string to connect to the Online Archive. To learn more, see Connect to Online Archive.
Use $merge stage to move the data from your archive to your Atlas cluster.
To learn more about the $merge pipeline stage syntax and usage
for moving data back into your Atlas cluster, see the $merge pipeline stage.
Example
Consider the following documents in an S3 archive:
{ "_id" : 1, "item": "cucumber", "source": "nepal", "released": ISODate("2016-05-18T16:00:00Z") } { "_id" : 2, "item": "miso", "source": "canada", "released": ISODate("2016-05-18T16:00:00Z") } { "_id" : 3, "item": "oyster", "source": "luxembourg", "released": ISODate("2016-05-18T16:00:00Z") } { "_id" : 4, "item": "mushroom", "source": "ghana", "released": ISODate("2016-05-18T16:00:00Z") }
Suppose you intend to restore documents based on the item and source
fields during the $merge stage. The following code sample shows an example of
using the $merge stage to restore archived data based on that
criteria:
db.<collection>.aggregate([ { "$merge": { "into": { "atlas": { "clusterName": "<atlas-cluster-name>", "db": "<db-name>", "coll": "<collection-name>" } }, "on": [ "item", "source" ], "whenMatched": "keepExisting", "whenNotMatched": "insert" } } ])
The code employs the following logic:
If an archived document matches a document on the Atlas cluster on the
itemandsourcefields, Atlas keeps the existing document in the cluster because the copy of the document on the Atlas cluster is more recent than the archived version.If an archived document doesn't match any document in the Atlas cluster, Atlas inserts the document into the specified collection on the Atlas cluster.
When restoring data to the Atlas cluster, the archived data might have
duplicate _id fields. For this example, we can include a $sort
stage for sorting on the _id and released fields before the
$merge stage to ensure that Atlas chooses the documents with the
most recent date if there are duplicates to resolve.
The following code sample adds the $sort stage:
db.runCommand({ "aggregate": "<collection>", "pipeline": [ { $sort: { "_id": 1, "released": 1, } }, { "$merge": { "into": { "atlas": { "clusterName": "<atlas-cluster-name>", "db": "<db-name>", "coll": "<collection-name>" } }, "on": [ "item", "source" ], "whenMatched": "keepExisting", "whenNotMatched": "insert" } } ], "cursor": { } }, { "background": true } )
To learn more about resolving duplicate fields, see the $merge considerations.
Note
If there are multiple on fields, you must create a compound
unique index on the on identifier fields:
db.<collection>.createIndex( { item: 1, source: 1 }, { unique: true } )
Alternatively, specify merges sequentially, one for each on identifier
field, to a temporary collection. Then, merge the data in the temporary
collection to the target collection using the cluster's connection string.
You must still create a unique index for each on identifier field.
You can run the aggregation stage in the background by setting the background
flag to true. To run this command in mongosh, use the db.runCommand.
Verify data in the Atlas cluster and delete the online archive.
See Delete an Online Archive for more information.
Note
If you run into issues while migrating data back to your Atlas cluster, contact Support.