We have a 3 node cluster with 128GB RAM in each node. Heap allocated for each node is 31GB. Our expected monthly record count is 8 billion per month. So we have created daily indices to hold data up to 2 years. Now, after 9 months heap usage always stays around 80% and sometimes all nodes crash at the same time due to OutOfMemory exception. Can the heap usage reduced by reindexing the documents with monthly indices? If so, what kind of heap usage improvement can I expect?
Is there any other way I can reduce heap usage without reducing number of shards?
How many indices and shards do you currently have in the cluster? What is the average shard size in the cluster? Have you read this blog post around shards and sharding?
There is also a change in requirement to upgrade the cluster to hold 15 year data. In that case, I believe it is important to reindex with monthly indices. Do you agree? I am thinking about 18 nodes with 30GB heap each and 6 primary shards + 1 replica shard. Can I have your advice on this?
I would recommend performing a benchmark to determine the max shard size as described in this Elastic{ON} talk. 300GB is quite large, and may result in slow queries and issues when recovering.
If I calculate correctly, you estimate you will generate about 3.6TB of indexed data per month (primaries and replicas). Over 15 years that is 648TB. To handle that amount of data I suspect you will need considerably more than 18 data nodes.
I still have to watch the talk as I am currently traveling. Just a quick question before I watch - do you think it is better to have nodes with lesser RAM than 128GB (i.e. 64GB) when planning the cluster to hold 15 years of data?
When holding lots of data you often want to maximize heap. You could do that by having smaller hosts or simply running 2 Elasticsearch instances on each host. I would recommend spinning up a cluster with a few nodes and run a benchmark to determine exactly how much data you will be able to hold per node based on your expected indexing and query load as described in the video I liked to. This will allow you to estimate how many nodes you will need for that amount of data.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.