_bulk request with routing parameter

Currently, I am applying _bulk requests for reindexing. Each _bulk request contains a ton of reindex requests ( creating, updating, or deleting). Each document already has its own routing.

I wonder if I apply routing on _bulk requests, I will get better performance or not.

To do that, I also need to make sure all documents inside the same _bulk request have been stored in the same shard.

If the answer is yes, which hashing function I must use to group message on _bulk request.
Anyone can help to verify this case.

If you are indexing immutable data this could improve performance as the batch size going to each shard would likely increase. To make sure you spread the load you could use a timestamp or random number as routing key. I am not sure how much impact this may have though.

Why is your diagram showing bulk requests going through a master node?

my _bulk request contains multiple reindex requests. Each document already contains its routing. I wonder if I apply routing param for the request, can I reduce performance as transport between ES node or not.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.