Optimal Shard Strategy for High Search Load with Elasticsearch Cluster

taichi · June 29, 2024, 8:54am

Hi,

I am working on optimizing an Elasticsearch cluster and am seeking advice on the best shard strategy. Here are the specifics of my setup:

Node Details: 4 data nodes running on AWS r6g.large instances, with 3 master nodes
Index: Single index containing 7 million documents
Current Shard Configuration: number_of_shards: 2, number_of_replicas: 1
Workload Characteristics: Predominantly search-heavy, with peak search requests reaching up to 100 req/s. During peak times, the average CPU usage on our data nodes reaches 70%.

Given these parameters, I am exploring how to adjust the number of primary and replica shards to better manage the search workload and reduce the impact on node performance.

I would appreciate any insights or recommendations on how to configure my shard setup to ensure optimal performance and scalability.

Thank you in advance for your help!

Christian_Dahlqvist · June 29, 2024, 9:36am

What is the size of your index? Which version of Elasticsearch are you using? What type of queries are you using?

taichi · July 2, 2024, 9:24am

size index: 3GB
version of Elasticsearch: 7.10
type of query: all of our queries are like below (simple query including text match)

We use ngram (min-gram: 2, max-gram: 3) to tokenize text fields because they include Japanese words.

{
    "query": {
        "bool": {
            "filter": [
                { "term": { "some_flag": true } },
                {
                    "bool": {
                        "should": [
                            {
                                "multi_match": {
                                    "query": "some word",
                                    "fields": ["text_a", "text_b", "text_c"]
                                }
                            },
                            {
                                "match_phrase": {
                                    "text_d": "some word"
                                }
                            }
                        ]
                    }
                }
            ]
        }
    },
    "size": 100,
    "from": 0,
    "sort": {
        "updated_at": "desc"
    }
}

Christian_Dahlqvist · July 2, 2024, 9:30am

It looks like the complete index is small enough to fully fit within the page cache, assuming you do not have a lot of other data in the cluster as well.

I would therefore recommend changing to a single primary shard and configure 3 replica shards so all nodes hold a full copy of the index in a single shard. If the page cache is unable to hold the full shard I would try reducing the heap size below the recommended 50%. This way each node can serve each request and there is les coordination overhead.

There has been a lot of performance improvement with respect to performance in Elasticsearch version 8.x, so I would recommend you upgrade to the latest version.

taichi · July 2, 2024, 9:38am

Thank you!

If I change to a single primary shard and configure 3 replica shards, I think only one data node can handle indexing requests. Wouldn't this cause an imbalance in CPU load? In our cluster, while it's not very frequent, there is always running processes to index user data.

Christian_Dahlqvist · July 2, 2024, 10:03am

All nodes perform indexing into the shards they hold so the node holding the primary may do a bit more work but I would not expect a big difference. I would recommend you test it to find out.

taichi · July 2, 2024, 10:22am

Thanks for the info.

Does "all nodes perform indexing into the shards they hold" mean that because replica shards do the job of replicating documents from the primary shard, the CPU load does not become too unbalanced?

I'll try testing it out to see what actually happens.

taichi · July 2, 2024, 10:29am

Although we store the original data in an RDB, I'm also a bit concerned about the risk of the data node with the only one primary shard going down.

Christian_Dahlqvist · July 2, 2024, 11:19am

If the node with the primary shard goes down one of the replica shards will be promoted to primary so that is not an issue.

Topic		Replies	Views
Trying to optimize Elasticsearch cluster Elasticsearch	3	963	February 20, 2017
Intial load and Cluster configuration query Elasticsearch	5	467	July 6, 2017
Large shard size Elasticsearch	4	399	December 4, 2021
Configuration of the number of shards , under ElasticSearch version 1.3.1 Elasticsearch	1	377	July 6, 2017
Optimizing shard size and count per node Elasticsearch	1	332	July 6, 2017

Optimal Shard Strategy for High Search Load with Elasticsearch Cluster

Related topics