Blog Logo
TAGS

Best Versions with MongoDB

Recall our previous discussion about ways to recreate older version of a document that ever existed in a particular collection. The goal was to preserve every state for each object, but to respond to queries with the “current” or “latest” version. We had a requirement to be able to have an infrequent audit to review all or some previous versions of the document. I had suggested at the time that there was a different way to achieve this that I liked better than the discussed methods and I’m going to describe it now. Up to this point, we considered keeping versions of the same document within one MongoDB document, in separate documents within the same collection, or by “archiving off” older versions of the document into a separate collection. We looked at the trade-offs and decided that the important factors were our ability to return or match only the current document(s), generate new version number to “update” existing and add new attributes, including recovering from failure in the middle of a set of operations (if there is more than one). Here’s a table that shows for each schema choice that we considered how well we can handle the reads, writes and if an update has to make more than one write, how easy it is to recover or to be in a relatively “safe” state. No doubt you noticed that fetching one or many is fastest and simplest when we keep the old versioned documents out of our “current” collection. This makes our queries whether for one or all latest versions fast and they can use indexes whether you’re querying, updating or aggregating. How do we get fast updates that keep the current document current but save the previous version somewhere else? We know that we don’t have multi-statement transaction in MongoDB so we can’t ensure that a regular update of one document and an insert of another document are atomic. However, there is something that’s always updated atomically along with every write that happens in your collection, and that is the “Oplog”. The oplog (full name: ‘oplog.rs’ collection in ‘local’ database) is a special collection that’s used by the replication mechanism. Every single write operation is persisted into the oplog atomically with being applied to the data files, indexes and the journal. You can read more about the oplog in the docs, but what I’m going to show you is what it looks like in the oplog when an insert or update happens, and how we can use that for our own purposes.