ES 1.5 Master Out of Memory

Hello,

We upgraded our ES version from 1.4 to 1.5 about 3 weeks ago. After this upgrade our master node started to go down due to memory problem once a week. Error message is below:

[DEBUG][action.admin.cluster.node.stats] [es_master_02] failed to execute on node [brirsjseReWgd7nSXaE0DQ]
org.elasticsearch.transport.SendRequestTransportException: [es_data_4][inet[ip-10-140-239-168.ec2.internal/10.140.239.168:9300]][cluster:monitor/nodes/stats[n]]
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:213)
        at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.start(TransportNodesOperationAction.java:165)
        at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.access$300(TransportNodesOperationAction.java:97)
        at org.elasticsearch.action.support.nodes.TransportNodesOperationAction.doExecute(TransportNodesOperationAction.java:70)
        at org.elasticsearch.action.support.nodes.TransportNodesOperationAction.doExecute(TransportNodesOperationAction.java:43)
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
        at org.elasticsearch.cluster.InternalClusterInfoService$ClusterInfoUpdateJob.run(InternalClusterInfoService.java:260)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space

We have 3 master eligible nodes and this one is the active master. It happened on other masters too when they are the active one. Other nodes don't get active and cluster state remains red until a restart when this error happens.

Is there anyone who have seen this kind of error before? Is it a issue about 1.5 or are we doing something wrong?

Thanks,

Umutcan

Well the only way to address this is to

  • Add more heap
  • Add more nodes
  • check the queries going against it to make sure they are not pulling back to much data

Have you installed Marvel, Bigdesk or Paramedic to see how your index and nodes are performing?

We are using Kopf as a monitoring tool. We have been using same configuration for a couple of months and never had this issue before version upgrade. I even deleted some old indices to lower the load.