Elastic I/O Optimised Scaling

Omair_Malik · September 11, 2020, 4:31pm

Hi All,

We currently have the following setup at Elastic:

x1 Data Configuration in 3 zones = 3 nodes in total
x1 Kabana node
x1 APM node

Due to shards running low in the coming months, we want to add another node ie 4 Data Configuration nodes.

Playing around with the Elastic Calculator online, I am not sure how we do this, as we are already using x3 zones. What is the most cost effective method? If I add just one more nodes in the current setup, this will entail first increasing the RAM to 60GB ie a fourfold increase in price, and then increasing the node by +1 , which is now an eight-fold increase in price as we will then increase Data Configuration to x6 nodes + x3 Master nodes which are now required in 3 zones.

Is there an easy way to do this, all we want to do is simply add one more data node. We can even use just x2 zones as opposed to currently 3?

Thanks in advance
Omair

Christian_Dahlqvist · September 11, 2020, 8:46pm

Are you approaching the limit in terms of number of shards per node? Is that why you need more nodes rather than more capacity in terms of larger more powerful nodes?

If that is the case I recommend you read this blog post as you have far too many small shards which is very inefficient. I would recommend you reconsider your sharding scheme and look to reduce that substantially.

Omair_Malik · September 14, 2020, 1:34pm

Hi Christian,

Yes shards per node. We previously received a logstash error about "this cluster currently has [2000]/[2000] maximum shards open. We since then increased the node and this has now given us more space in terms of shards, but we are at 2815 now, and I guess our limit will be 3000.

I red this blog, alot of info. What exactly would you recommend we do?

Omair

Christian_Dahlqvist · September 14, 2020, 1:43pm

As I said in my previous post - change how you shard your data in order to reduce the number of shards.

Omair_Malik · September 14, 2020, 3:18pm

Need some elaboration on the change .

I got no visibility of the performance metrics in our cluster, monitoring is not enabled. Can you please advise e.g. to see shards used by memory heap, retention period, size of shards etc.

get /_cluster/allocation/explain?pretty doesnt give me much.

Form what I can extrapolate, the doc is suggesting:

Use time-based indexing, I believe we are already
Would increasing RAM bolster the number of shards, as I understand 50% of RAM is the memory heap?
Are you suggesting we should shrink the index API to achieve fewer shards?

Thanks

Christian_Dahlqvist · September 14, 2020, 8:03pm

If you have more than one primary shard per index you can use the shrink index API to reduce shard count. A common way to reduce shard count is to switch from daily to e.g. weekly or monthly indices. You can also start using ILM and rollover to cut indices based on target size rather than just timestamps. If you have split your data across many small indices, e.g. by application or service, you can reduce shard count by consolidating these.

If you delete some data to get some headroom you can change how you index new data so you generate fewer indices and shard per time period. As data ages out of the system the shard count will drop. If you are looking to keep your data for a long time you may need to reindex smaller indices into fewer larger ones.

As far as I can tell you need fewer shards, not more heap or nodes.

system · October 12, 2020, 8:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
3 nodes cluster Elasticsearch	6	821	January 10, 2018
Total shards per node calculation Elasticsearch	5	531	February 21, 2022
Setting up ElasticCloud to handle large data - does adding new nodes just replicate or share data? Elasticsearch	5	304	May 28, 2021
Trying to optimize Elasticsearch cluster Elasticsearch	3	963	February 20, 2017
When do you need more then 1 shard? Elasticsearch	12	1853	July 6, 2017

Elastic I/O Optimised Scaling

Related topics