How to store document history (versioning, revisions)?

We have a very common requirement: On every document update, we need to store the old version. Thus we would have a complete history of document changes.

I plan to build a second index that holds all the archived documents. Hence before updating a document, it would be reindexed in this second index. Does this make sense? Or is there a build-in function or a plugin for doing such document-revisions.

Another aspect: We would like to save just the DIFF between the new document and the old one. Is there a build-in mechanism or plugin for doing this?

Thanks for helping out :relaxed:

1 Like

Another aspect: We would like to save just the DIFF between the new document and the old one. Is there a build-in mechanism or plugin for doing this?

And the point would be to save disk space, or what's the background?

Exactly!

Anyone some hints or opinions on document revisions?

Our only built-in notion of versioning is limited to incrementing the version number attached to a single instance of a doc whenever it is updated. This is there to support optimistic-locking style use cases, not for maintaining multiple concurrent versions in the index.
If you want multiple versions in the index they will need to be managed in your application code and you will typically also have to worry about de-duping versions in search results.

I am interested in writing a plugin for ES that implements JSON Patch

http://tools.ietf.org/html/rfc6902

together with HTTP PATCH

https://tools.ietf.org/html/rfc5789

and storing the patches beside the document could be one nice side effect.

Maybe it is that kind of feature you are looking for? I'm not sure, because it does not save space at all, it will increase resource usage.

1 Like

Thanks guys. Ok, I will go with a second index where I store the archived versions. The patch stuff seems very interesting..

I am also looking for similar solution, any plugins or tools to

One possibility is to store all documents in JSON Patch notation, where the very first document is made up of all 'Add' operations. After that, store all subsequent operations in time order. You can keep an effective document that represents the most current document for quicker retrieval.