Im currently working on a document which needs to be in a monthly index. This document contains 2 datetime fields which will be used as a reference to where it needs to indexed. The first one being the datetime when it is created (lets call it fieldA) and another field (lets call it fieldB) when the document transitioned to its final state. Since i am to create a monthly index based on fieldB which only gets populated when the document is in final state, what's the best approach for me to handle this index in elasticsearch given that:
- monthly index is based on fieldB
- document still needs to be searchable if fieldB is not yet populated
- i need to guarantee that only 1 version of this document exist (in the monthly index based on fieldB or another index when only fieldA is available and document is not yet in its final form)
- i have no control when can i receive the request to index a document (ie its possible that i get a document version 3 first (has fieldB populated) followed by document version 2 (null fieldB)
Options i have thought of which i am currently trying:
- Indexing will be done through a pipeline,; this pipeline contains a script processor to check if document exists in the non-final state and move it to the monthly index. is it possible to have a race condition?
- keep 2 copies of the document but create a separate processor to remove the duplicate (this however is not real time)
I would like to know what's the recommended way of handling this case.
Thank you