A few questions optimizing our Elastic Stack

We have a large number of shards spread across around 80 daily indexes and growing (5 primary 1 Replica). So far we're clocking around 400 shards. Clients do tend to query data in the older indexes.
We have 5 master data nodes and 2 Coordinating nodes. Master Data Nodes are running on m5x4Large EC2 Instances in AWS. So altogether 7 Nodes including the Coordinating Nodes.

Had a few questions around this:

  • Is this a a large number of shards (what is considered large from my stack perspective)?
  • When should I merge/shrink indexes? Which is preferable to keep the elastic stack healthy?
  • We're starting to see a few EsRejectedExecutionException[rejected execution (queue capacity) exceptions in the nodes. What configuration should I be looking at to solving this?

Looking for any help on this. Thanks!

400 daily shards on a cluster with only 5 data data nodes sounds excessive. Even 80 daily indices with just 1 primary and replica shard each sounds like a lot. Please read this blog post for some guidance on shards and sharding.

Sorry I wasn'tclear . I meant I have around 80 total indexes . Due to rollover policy using curator a new index is created daily. If I do cat indices..the result shows each index with 5 primary and 1 replica. This has so far resulted in around 400 primary shards (so 800 total I figure with replicas)

i saw your post - it feels like while i have no control over the daily rollover index - i can use the shrink API to reduce the number of shards as the only solution at this point?

Rather than using the shrink API, which requires all shards to be on the same node and partition, I would recommend trying to ensure your shards are correctly sized from the start by using the rollover index API.

This depends on how much heap you have available and what the use case is. If you want to keep data around for long time, try to make sure your shards are quite large.

Forcemerging in order to reduce segment count is I/O intensive, but often worth doing if you have a long retention period.

Is this during search or indexing? Index time rejections are described in this blog post, but similar principles apply if you see it at query time as well.

I've configured 30GB has Heap (have a total memory of 64GB). Our retention period is 6 months (with daily indexes being generated).
Would any range between 20GB to 50 GB suffice as shard size?

Without knowing anything about your use case I would say that shard size range sounds reasonable.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.