I would like, if possible, for the new documents to be on the same server instead of being rerouted to a different one. This should be much more performant I would think and we might be reindexing a lot of data. We would then deal with rerouting or rebalancing later if needed.
Is this the default behaviour or is there something I can do to force it? Thanks
Reindex routes a documents through the coordinating node no matter where
they start from or end up. Most of the time in reindex is taken indexing so
it wouldn't save much time to do this and it'd be much more complex.
Maybe for very large documents with very simple analysis chains it'd make
sense but I don't think it is worth it.
I hear the points, especially if it makes things complicated.
I'll explain the use case I was thinking about and you can tell me if you think the direction has merit. Since as you said, that sometimes indexing is the slow part, I was thinking about writing raw data to a special pattern that would index or store anything - just the meta data and the json document. Writing to this should be very very fast. Then every few seconds I could reindex from one of this "write-only" indices into the regular index, which might be slower.
The advantage of this approach is that I could absorb write-spikes and I might have more resiliency for my data. My thought is that if I was making this a normal part of my workflow, It would be extraneous for it to have to push this data over the network again - it could do the entire process within that node.
I haven't tested this approach yet - Just thinking about it. Do you think this approach would help if at the end of the day, the writes are going back over the wire anyway?