How to reshard the indices to overcome latency during high traffic in elasticsearch cluster

Karthikeyan_Amaresan · December 19, 2023, 6:51pm

During high traffic times, our Elasticsearch cluster is experiencing latency, and we are considering a resharding strategy to optimize performance. Below is our current index setup and the proposed resharding plan:

Current Indices:

billing-index-v0.0: 45 GB, 20 shards (over-sharded)
billing-index-v1.0: 197 GB, 20 shards (over-sharded)
billing-index-v2.0: 1.4 GB, 20 shards (over-sharded)
billing-index-v4.0: 947 GB, 20 shards (over-populated)

Proposed Resharding: (25 GB Aimed per shard)

billing-index-v0.1: 45 GB, 2 shards
billing-index-v1.1: 197 GB, 8 shards
billing-index-v2.1: 1.4 GB, 2 shards
billing-index-v4.1: 947 GB, 40 shards

Questions:

Will the proposed resharding strategy help alleviate latency during high traffic times?
Are there any potential drawbacks or considerations we should be aware of?
Should we consider a different resharding approach based on our current setup?

Query patterns:

{
  "size": 20,
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "filter": {
              "term": {
                "accountId": "9812374287423"
              }
            },
            "must_not": {
              "exists": {
                "field": "linkedAccount"
              }
            }
          }
        }
      ],
      "must_not": [
        {
          "term": {
            "isDeleted": {
              "value": true
            }
          }
        }
      ],
      "must": [
        {
          "multi_match": {
            "query": "Kelly",
            "fields": [
              "firstName.autocomplete",
              "lastName.autocomplete",
              "fullName.autocomplete",
              "emailId.autocomplete",
              "phone.autocomplete",
              "companyName.autocomplete",
              "country.autocomplete",
              "state.autocomplete",
              "city.autocomplete",
              "address.autocomplete"
            ]
          }
        }
      ]
    }
  },
  "sort": [
    {
      "modifiedDate": {
        "order": "desc"
      }
    }
  ]
}

Hardware specifications: 4 Nodes, 16 Core CPUs, 32 GB Memory.

Any help is much appreciated. Thanks in advance.

Christian_Dahlqvist · December 19, 2023, 7:02pm

Karthikeyan_Amaresan:

Current Indices:

billing-index-v0.0: 45 GB, 20 shards (over-sharded)
billing-index-v1.0: 197 GB, 20 shards (over-sharded)
billing-index-v2.0: 1.4 GB, 20 shards (over-sharded)
billing-index-v4.0: 947 GB, 20 shards (over-populated)

How many primary and replica shards do these indices have?

That depends on what is causing the slowness. How fast is it during non-peak times and how slow is it during peak times?

It would help if you could provide some additional context around the use case, e.g.:

Which version of Elasticsearch are you using?
How many concurrent queries are you serving? How are these distributed across the indices?
Are you continously adding/updating/deleteing data from these indices? If so, at what rate?
What is the average size of your documents?

Given that you have more data than can fit in the operating system page cache, the type of storage used is very important and can very well be the bottleneck. What type of storage are you using? Local SSDs?

Have you checked how the storage is performing?

Do you have monitoring installed so you can see how high CPU and heap usage is?

Karthikeyan_Amaresan · December 21, 2023, 10:44am

Hi @Christian_Dahlqvist
Thank you for taking time to write back.

How many primary and replica shards do these indices have?

20 Primary shards each with 1 replica = 40 shards in total, per index

Which version of Elasticsearch are you using?

How many concurrent queries are you serving? How are these distributed across the indices?

Are you continously adding/updating/deleteing data from these indices? If so, at what rate?

What is the average size of your documents?

Elasticsearch version 7.9

Application server request for
High traffic - Search (/search route) 350/min , 5/s
Low traffic - Search (/search route) 160/min , 2.5/s

We are using alias name for reading from the indices billing-alias
billing-index-v0.0, Search, Update only,
Indexing Rate - Total Shards 0.18/S, Primary Shards - 0.09/s
Search Rate - 155/s (high traffic) 33/s low traffic (number of search requests being executed across primary and replica shards)

billing-index-v1.0, Search, Update Only
Indexing Rate - Total Shards 1.78/S, Primary Shards - 0.88/s
Search Rate - 152/s (high traffic) 33/s low traffic

billing-index-v2.0, Search, Update Only
Indexing Rate - Total Shards 0.04/S, Primary Shards - 0.02/s
Search Rate - 157/s (high traffic) 34/s low traffic

billing-index-v3.0, Search, Update Only
Indexing Rate - Total Shards 4.65/S, Primary Shards - 2.28/s
Search Rate - 157/s (high traffic) 40/s low traffic

billing-index-v4.0, Search, Update Only
Indexing Rate - Total Shards 19.6/s, Primary Shards - 9.58/s
Search Rate - 158/s (high traffic) 33/s low traffic

billing-index-v5.0, Write, Search, Update (Current write index where new docs getting created)
Indexing Rate - Total Shards 16.64/S, Primary Shards - 8.05/s
Search Rate - 155/s (high traffic) 33/s low traffic

Average size of the document is 900 bytes.

What type of storage are you using? Local SSDs?

Yes.

Have you checked how the storage is performing?

Disk IOPS
Write:
Max: 0.08k/s
Mean: 0.07/s
Min: 0.06/s

Read
Max: 1.99k/s
Mean: 1.19/s
Min: 0.8k/s

Do you have monitoring installed so you can see how high CPU and heap usage is?

CPU is well within 20 percent even during high traffic time and as far heap usage is concerned no memory pressure observed. No frequent GCs and always stays within 70 and cycles

system · January 18, 2024, 10:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Adding new node and resharding Elasticsearch	4	3008	July 5, 2017
Alternatives to oversharding to handle index / cluster growth? Elasticsearch	10	1108	July 6, 2017
Dealing with latency when indexing Elasticsearch	5	682	July 6, 2017
Reindex API performance Elasticsearch	3	4490	July 5, 2017
Indexing performance Elasticsearch	6	365	July 6, 2017

How to reshard the indices to overcome latency during high traffic in elasticsearch cluster

Related topics