Over the last year we've made several changes to the schema of our index which required a reindexing. Our indexes are both read and write heavy, but this hasn't been a problem because we have used aliases from the beginning and all writes get funneled through a single queue, and that queue can be paused during reindexing.
Our platform has become more complicated, requiring hot-writes instead of a passive upserts. We make standard upserts, use scripts to update nested documents and delete documents in real time. Multiple systems are making writes simultaneously across our platform. It's complicated, but i don't think there's anything wrong with the architecture- it's necessarily complicated.
Now I'm in a situation where I'm not sure how to reindex without downtime anymore. If we shut off the systems that make writes we are affectively now shutting off the system. We may have been able to get away with this before when reindexing only took a couple minutes, but we're now looking at 30 minute reindex time, and that's going to increase undoubtably.
Here's my conundrum step-by-step:
> Create new Index B
> Reindex A to B over next 30 minutes
* During which time updates have been made to index A
> Point index alias from A to B
> Reindex A to B checking for version conflicts
* Which will leave some documents in an older state
* Which will leave some documents in a forked state (changes made to B while A was reindexing)
I have seen it suggested that you could have a read-alias and a write-alias. But my writes sometimes use scripts to modify nested documents (requiring the document to exist). Additionally, my writes sometimes must be read by other processes within seconds of the write.
I am using dynamic schemas. I have thought about pushing dummy schema to my index with the correct data types to lock in the correct data types for those fields and skip reindexing. However, in the event that the wrong datatype is pushed I'd be stuck in this situation again where I'll need to do a full reindex. As far as i'm aware, there's no way to drop fields without the reindexing api.
How can I approach the problem of reindexing in both a Read and Write heavy environment?