Cluster advice when scaling

Got a log cluster with 3 nodes (all data, ingest, master nodes) with 64GB ram, 1TB SSD and E5-26xxvx's. The load is between 30 and 70% depending on the version of the CPU. All running Windows Server.

Some specs of the cluster

  • Approx 1.500.000.000 docs
  • 6500 shards
  • 5-7 index patterns
  • Index rate is 300-400/s (primary shards)
  • 1 replica/5 shards pr index
  • Search rate below 100/s

Looking to upgrade the cluster with 1 or 2 nodes in order to:

  • Increase storage, due to more applications adding data.
  • Improve search response time from Kibana and Grafana. Tend to get time outs in Kibana and just raised the timeout to 60s. Grafana will be used a lot more, so search rate will increase.

My question is whether I should change the cluster to have dedicated data and master nodes (e.g. I have the possibility to add vm's if required) or if I should just add the extra nodes as data, master and ingest nodes like the current setup? Any advice?

Since the cluster size is not that big (Around 5 nodes), you can go with default settings (No specific roles assigned to nodes). However master nodes should be kept in odd numbers(Current number(3) sounds good). preferably keep 3 replicas. Enable compression if possible so that you can save up some space. Also if possible have a dedicated server for kibana which acts as a coordinating node in the elasticsearch cluster. This need not have high storage configuration as it will not store any data. RAM size should be good (64Gigs looks great) as it will communicate with data nodes to ingest and query data. This will also help to improve query performance.

Just to summarize since ingest rate is not that big, you can merge data and ingest roles. Use kibana node as coordinating node. 3 master nodes should be good enough to handle resilience.

For just 3 data nodes this is far too many shards. According to official recommendation:

A good rule-of-thumb is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better. This will generally help the cluster stay in good health.

So even if you max out with 30 GB Java Heap Space per node you should have less than 3 * 600 = 1800 shards in your cluster, preferably much less. So I would start by shrinking the many-shard indices or reindex them to indices with fewer shards.

When I design my indices I typically aim for 20-40 GB of data per shard so if I set up an index that is expected to receive 200 GB of data (let's say a monthly index) I will give it 6 or 8 primary shards since that will result in a shard size of either 33 GB or 25 GB at the end of the month. If the index is static, with no more documents added, I would aim for the larger 33 GB size but if it will still receive updates I would go for the smaller 25 GB.

In your case, with high indexing rates, it may be useful with many shards (up to a limit, of course) since each primary shard will get its own indexing thread (as long as there are threads in the thread pool) so you could look at the hot-warm architecture and see if that fits with your use case.

1 Like

Hi Bernt,

Thank you for your reply. Makes perfectly sense - I've been trying out reindexing with Curator into weekly indices (doing daily atm) and hitting 40 to 50GB However still not implemented correctly yet.

We have looked at hot/varm architecture, but mostly to extend our data retention. Atm we keep 6 months of data.

It seems like I should get the shards count reduced asap and see how this impacts the performance.


Would still like input to master/data node setup.

For small clusters, like your 3 or 5 node setup, I would probably not go for dedicated master nodes but let all of the data nodes also be master eligible ( node.master: true in elasticsearch.yml). I use this setup for a couple of smaller clusters.

If that turns out to be a bad choice, e.g. if indexing and searches take so much resources that the master functions suffer (with cluster state timeouts in the logs), you'll need dedicated masters in the cluster. You can go for 1, but that's not very fault tolerant since the cluster will go down if you lose the master. The next level is 3 dedicated masters, which is what I use, as that will prevent split brain issues if you set discovery.zen.minimum_master_nodes: 2. You don't need very powerful servers to run the masters, just good connectivity for frequent cluster state updates and to quickly elect a new master if the current master goes down.

Excellent - I'll just add the new nodes as both master and data and see how performance is impacted. I don't get any cluster state timeouts, but I do get a lot of node_stats time outs when collecting data.

This also signals a busy cluster but I hope you can reduce the problem by having fewer shards (as that should reduce the time for collecting node stats too) and by adding extra nodes to spread the shards and workload.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.