We have a very common requirement: On every document update, we need to store the old version. Thus we would have a complete history of document changes.
I plan to build a second index that holds all the archived documents. Hence before updating a document, it would be reindexed in this second index. Does this make sense? Or is there a build-in function or a plugin for doing such document-revisions.
Another aspect: We would like to save just the DIFF between the new document and the old one. Is there a build-in mechanism or plugin for doing this?
Our only built-in notion of versioning is limited to incrementing the version number attached to a single instance of a doc whenever it is updated. This is there to support optimistic-locking style use cases, not for maintaining multiple concurrent versions in the index.
If you want multiple versions in the index they will need to be managed in your application code and you will typically also have to worry about de-duping versions in search results.
One possibility is to store all documents in JSON Patch notation, where the very first document is made up of all 'Add' operations. After that, store all subsequent operations in time order. You can keep an effective document that represents the most current document for quicker retrieval.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.