Hi,
I have 10 machines and configured 10 shards (2 replicas for each). Is there a way for me to set that Shard 1 primary will reside on machine 1, Shard 2 primary will reside on machine 2 and so on...?
Thanks,
Hi,
I have 10 machines and configured 10 shards (2 replicas for each). Is there a way for me to set that Shard 1 primary will reside on machine 1, Shard 2 primary will reside on machine 2 and so on...?
Thanks,
Not really. But why do you want that?
If you have 10 nodes and 10 shards in total, you will end up with one shard per node.
I need this to improve indexing time. I can route relevant data to a local indexer process running on the same machine.
I could disable balancing, route each primary to its desired place, but then I am loosing the auto balancing. Also needs maintenance while new indexes are created....
The best way to improve indexing performance is to use the _bulk API. As the documents in a single bulk request can belong to different shards, it is best to treat the cluster as a black box and let Elasticsearch manage distribution of data. Have you looked at the available documentation regarding indexing performance tips and performance considerations for Elasticsearch indexing?
I am using bulk API. This is to improve the performance more.
For example, we have an API in our app to scroll all data. Instead of having one request scrolling on ES, I have created parallel scroll streams for each primary shard. This gave as a huge boost in throughput.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.