Hi, I have 4 cold elastic search data nodes that some time receive OutOfMemoryError. It seems to be related to moving indexes to these nodes. The cold nodes spec is:
8 Cores, 32 GB (16 assigned to JVM) 5 spinning disks 2TB each.
Each cold node curuntly has 3TB of data and 800 shards
Form the memory dump it seems like there are many threads stuck in:
Based on the general recommendations in this blog post, that sounds like a lot of shards given how much heap you have available on each node. In order to be able to hold more data on the cold nodes, you may want to use the shrink index API to reduce the number of shards before you relocate the indices to the cold nodes. You can also benefit from minimising the number of segments by forcemerging indices that are no longer written to.
I understand that the node may be holding more shards than recommended, but as the blog post you linked says there is no fixed limit enforced by Elasticsearch. I can see why this will affect performance, but this should not cause a node to crash. Receiving an OutOfMemoryError means there is some fault in memory / thread-pool handling. The following is the full stack-stace of the thread that received the exception:
Do you have monitoring installed so you can see how the heap used varies over time?
What type of queries/aggregations/dashboards are you running against the data stored on these nodes?
The segments and shards held on a node will use up heap space. The more you have and the less these are optimised, the less heap space is left over for querying. Elasticsearch has circuit breakers that try to stop queries that could cause OOM, but they are apparently not able to catch all scenarios. If you want to reduce the chance of nodes going OOM, you may need to tune them. This will however lead to queries failing.
Looking at all the threads call stacks in the dump, it seems that process didn't do anything related to queries (see here).
It seems to be related to the index rerouting that is taking place during the time the process crashed.
Briefly, we create snapshots from each index to backup the data, and then move the indices to these cold nodes by using index filtering by node name. The crash seems to correlate with the time this script runs.
Also, experimenting with the exact same architecture only with SSD disks in the cold machines - which works perfectly.
Only when moving a lot of shards to these HDD machines does the process crash for out of memory.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.