Is it possible to create the new document on the same server when using the rerouting api?

yehosef · August 1, 2016, 11:27am

I was reading https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html but I wasn't sure where the documents are routed by default.

I would like, if possible, for the new documents to be on the same server instead of being rerouted to a different one. This should be much more performant I would think and we might be reindexing a lot of data. We would then deal with rerouting or rebalancing later if needed.

Is this the default behaviour or is there something I can do to force it? Thanks

warkolm · August 2, 2016, 10:19am

Use allocation to put the index being reindexed, as well as the index you are outputting to, on the same node.

But really, I wouldn't bother.

nik9000 · August 2, 2016, 11:50am

Reindex routes a documents through the coordinating node no matter where
they start from or end up. Most of the time in reindex is taken indexing so
it wouldn't save much time to do this and it'd be much more complex.

Maybe for very large documents with very simple analysis chains it'd make
sense but I don't think it is worth it.

yehosef · August 2, 2016, 3:29pm

Thanks Mark and Nik,

I hear the points, especially if it makes things complicated.

I'll explain the use case I was thinking about and you can tell me if you think the direction has merit. Since as you said, that sometimes indexing is the slow part, I was thinking about writing raw data to a special pattern that would index or store anything - just the meta data and the json document. Writing to this should be very very fast. Then every few seconds I could reindex from one of this "write-only" indices into the regular index, which might be slower.

The advantage of this approach is that I could absorb write-spikes and I might have more resiliency for my data. My thought is that if I was making this a normal part of my workflow, It would be extraneous for it to have to push this data over the network again - it could do the entire process within that node.

I haven't tested this approach yet - Just thinking about it. Do you think this approach would help if at the end of the day, the writes are going back over the wire anyway?

Topic		Replies	Views
Redistribute documents with custom routing and reindex API Elasticsearch	2	1762	January 2, 2017
Cluster [271f47 ] Reindex api questions Elasticsearch	5	613	July 5, 2017
Replicating documents from one index to another in real time Elasticsearch	1	15	September 2, 2024
Reindex while writing to index Elasticsearch	1	668	March 16, 2018
Detect newly inserted documents in an index Elasticsearch	8	43	October 18, 2024

Is it possible to create the new document on the same server when using the rerouting api?

Related topics