I am working on optimizing an Elasticsearch cluster and am seeking advice on the best shard strategy. Here are the specifics of my setup:
Node Details: 4 data nodes running on AWS r6g.large instances, with 3 master nodes
Index: Single index containing 7 million documents
Current Shard Configuration: number_of_shards: 2, number_of_replicas: 1
Workload Characteristics: Predominantly search-heavy, with peak search requests reaching up to 100 req/s. During peak times, the average CPU usage on our data nodes reaches 70%.
Given these parameters, I am exploring how to adjust the number of primary and replica shards to better manage the search workload and reduce the impact on node performance.
I would appreciate any insights or recommendations on how to configure my shard setup to ensure optimal performance and scalability.
It looks like the complete index is small enough to fully fit within the page cache, assuming you do not have a lot of other data in the cluster as well.
I would therefore recommend changing to a single primary shard and configure 3 replica shards so all nodes hold a full copy of the index in a single shard. If the page cache is unable to hold the full shard I would try reducing the heap size below the recommended 50%. This way each node can serve each request and there is les coordination overhead.
There has been a lot of performance improvement with respect to performance in Elasticsearch version 8.x, so I would recommend you upgrade to the latest version.
If I change to a single primary shard and configure 3 replica shards, I think only one data node can handle indexing requests. Wouldn't this cause an imbalance in CPU load? In our cluster, while it's not very frequent, there is always running processes to index user data.
All nodes perform indexing into the shards they hold so the node holding the primary may do a bit more work but I would not expect a big difference. I would recommend you test it to find out.
Does "all nodes perform indexing into the shards they hold" mean that because replica shards do the job of replicating documents from the primary shard, the CPU load does not become too unbalanced?
I'll try testing it out to see what actually happens.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.