Heap usage vs number of shards

We have a 3 node cluster with 128GB RAM in each node. Heap allocated for each node is 31GB. Our expected monthly record count is 8 billion per month. So we have created daily indices to hold data up to 2 years. Now, after 9 months heap usage always stays around 80% and sometimes all nodes crash at the same time due to OutOfMemory exception. Can the heap usage reduced by reindexing the documents with monthly indices? If so, what kind of heap usage improvement can I expect?

Is there any other way I can reduce heap usage without reducing number of shards?

How many indices and shards do you currently have in the cluster? What is the average shard size in the cluster? Have you read this blog post around shards and sharding?

Thanks, I will read the blog post. Currently we have 285 indices and 1638 shards. Average shard size is around 8GB.

Have you run force merge with max_num_segments set to 1 on older indices that are no longer written to?

Thanks. I will check.

There is also a change in requirement to upgrade the cluster to hold 15 year data. In that case, I believe it is important to reindex with monthly indices. Do you agree? I am thinking about 18 nodes with 30GB heap each and 6 primary shards + 1 replica shard. Can I have your advice on this?

Are you going to index 15 years worth of data now or keep just keep the data you are indexing now that long?

We are going to index 10 years data now, and going to index for 5 more years later on.

How large do you estimate a shard for a monthly index would be?

Around 300GB for a single shard. I'm planning to have 12 shards per month (6 primary + 1 replica).

I would recommend performing a benchmark to determine the max shard size as described in this Elastic{ON} talk. 300GB is quite large, and may result in slow queries and issues when recovering.

If I calculate correctly, you estimate you will generate about 3.6TB of indexed data per month (primaries and replicas). Over 15 years that is 648TB. To handle that amount of data I suspect you will need considerably more than 18 data nodes.

I still have to watch the talk as I am currently traveling. Just a quick question before I watch - do you think it is better to have nodes with lesser RAM than 128GB (i.e. 64GB) when planning the cluster to hold 15 years of data?

When holding lots of data you often want to maximize heap. You could do that by having smaller hosts or simply running 2 Elasticsearch instances on each host. I would recommend spinning up a cluster with a few nodes and run a benchmark to determine exactly how much data you will be able to hold per node based on your expected indexing and query load as described in the video I liked to. This will allow you to estimate how many nodes you will need for that amount of data.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.