Been having issues with garbage collection taking nodes offline. For some reason when one node is taken down then the whole cluster becomes unresponsive.
These lines appear in log and then it dies
[2017-08-24T09:43:07,034][INFO ][o.e.m.j.JvmGcMonitorService] [se-prod-logyard5] [gc][247] overhead, spent [325ms] collecting in the last [1s]
[2017-08-24T09:43:08,035][WARN ][o.e.m.j.JvmGcMonitorService] [se-prod-logyard5] [gc][248] overhead, spent [633ms] collecting in the last [1s]
[2017-08-24T09:43:09,035][INFO ][o.e.m.j.JvmGcMonitorService] [se-prod-logyard5] [gc][249] overhead, spent [371ms] collecting in the last [1s]
[2017-08-24T09:43:10,037][INFO ][o.e.m.j.JvmGcMonitorService] [se-prod-logyard5] [gc][250] overhead, spent [460ms] collecting in the last [1s]
Here is heap usage over the past hour. Not sure yet if this indicates an issue with the heap being too big/memory leak/something we're doing wrong
Without any indication where what the memory is spent for, this is hard to help. First, are you sure, that you need 28 GB of memory for each node? Why not 16 or 8? What is this memory used for? You can use the node stats and node info APIs to find out more.
The question I am asking is, that if the memory is not really needed, it just gets used over time, but then a huge GC will take a lot of time, because it is able to clear out a lot of memory.
if you need this memory, maybe there are strategies to reduce memory consumption (different mappings, less shards ,etc).
So talking about your general setup might be a good thing, so other people get more context.
Not sure if we need 28G per node at this point, actually reduced it to 18G and it's better but still experiencing issues. I'll get some more insight into memory usage in a bit.
Just had a crash on one of our nodes and got this message after some long garbage collection times:
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [request] Data too large, data for [<reused_arrays>] would be [16605848256/15.4gb], which is larger than the limit of [16001453260/14.9gb]
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.