Currently, I am applying _bulk requests for reindexing. Each _bulk request contains a ton of reindex requests ( creating, updating, or deleting). Each document already has its own routing.
I wonder if I apply routing on _bulk requests, I will get better performance or not.
To do that, I also need to make sure all documents inside the same _bulk request have been stored in the same shard.
If you are indexing immutable data this could improve performance as the batch size going to each shard would likely increase. To make sure you spread the load you could use a timestamp or random number as routing key. I am not sure how much impact this may have though.
Why is your diagram showing bulk requests going through a master node?
my _bulk request contains multiple reindex requests. Each document already contains its routing. I wonder if I apply routing param for the request, can I reduce performance as transport between ES node or not.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.