ElasticSearch 2.3.4 grinding to a halt

Hi All,

I've got an issue with ElasticSearch, or more specifically I suspect with the JVM Heap it's using.

Before I go into detail overload, let me set out what I have.

The Setup

4 x Nodes (3 x Eligable masters & data noes, 1 x Client node).

Focusing on the master/data nodes they are running CentOS Linux release 7.2.1511 (Core), with java-1.8.0-openjdk-headless-1.8.0.91-1.b14.el7_2.x86_64 and ElasticSearch 2.3.4.

They are single core, 4 GB memory and 500 GB storage (specifically partitioned for the data of ES).

They are configured (correctly) to mlockall, the swappiness of the OS is 1 and the heap is set to 2 GB max and min.

On initial boot they seem to be working fine and for a little while they do.
Cluster shows healthy, and does so throughout.

Logging, to prevent swallowing storage of the root partition is set to INFO not DEBUG.

The Issue

The first symptom I start to experience, at least that I notice, is Kibana reporting that plugin:elasticsearch "Request Timeout after 30000ms".

I then notice slow responses to any APIs which need to poll the members, and I believe it's potentially stopping ElasticSearch processing correctly as a cluster.

The issue is occurring right now and if I try to run a direct /_cat API against the client node it takes quite some time to load.

When it has, it provides the following output - all good so far as I can tell:

host         ip           heap.percent ram.percent load node.role master name        
10.0.0.7 10.0.0.7            7          69 0.07 -         -      NODE7 
10.0.0.2 10.0.0.2           91          96 1.22 d         *      NODE2 
10.0.0.1 10.0.0.1           91          91 0.61 d         m      NODE1 
10.0.0.3 10.0.0.3           99          94 1.57 d         m      NODE3 

Tailed logs from NODE1:

[2016-08-06 03:46:09,951][WARN ][monitor.jvm              ] [NODE1] [gc][young][48296][58732] duration [1.9s], collections [1]/[2.4s], total [1.9s]/[33.1m], memory [1.3gb]->[1.3gb]/[1.9gb], all_pools {[young] [18mb]->[4.8mb]/[66.5mb]}{[survivor] [4.7mb]->[7.9mb]/[8.3mb]}{[old] [1.3gb]->[1.3gb]/[1.9gb]}

Tailed logs from NODE2:

[2016-08-11 09:17:35,415][WARN ][transport                ] [NODE2] Received response for a request that has timed out, sent [81266ms] ago, timed out [66266ms] ago, action [cluster:monitor/nodes/stats[n]], node [{NODE3}{f29G8IbhSYevQcDPrDUqyg}{10.0.0.3}{10.0.0.3:9300}{max_local_storage_nodes=1}], id [4506258]
[2016-08-11 09:17:35,415][WARN ][transport                ] [NODE2] Received response for a request that has timed out, sent [141270ms] ago, timed out [126267ms] ago, action [cluster:monitor/nodes/stats[n]], node [{NODE3}{f29G8IbhSYevQcDPrDUqyg}{10.0.0.3}{10.0.0.3:9300}{max_local_storage_nodes=1}], id [4506107]

Tailed logs from NODE3:

[2016-08-11 09:12:34,800][INFO ][monitor.jvm              ] [NODE3] [gc][old][370335][68699] duration [6.4s], collections [1]/[6.6s], total [6.4s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [59.3mb]->[56.2mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}
[2016-08-11 09:12:46,014][INFO ][monitor.jvm              ] [NODE3] [gc][old][370337][68701] duration [6.6s], collections [1]/[6.7s], total [6.6s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [59mb]->[62.2mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}

The first thing I notice from the above log snippets is that NODE3 is being complained about in the NODE2 (current master) logs.
Looking at NODE3, I'd say something doesn't look right with how it's garbage collecting. My knowledge of working with JVM is very minimal and I'm trying to learn to support it correctly for ES.

So... Questions at this stage:

  • What triggers garbage collection?
  • Why only NODE3? All indices are shard balanced over the three nodes. Balancing is enabled.
  • Why did NODE1 have a warning on GC, NODE3 only INFO?
  • What can I do to prevent this? Is this a sign that I need to scale up/out?
    • What should I be monitoring to predict that before we get to this stage?
  • Am I even looking at the right thing?

If I was to restart NODE3, I suspect it would restore service.
But only temporarily, and the last time this occurred it was NODE1 as master and NODE2 with the issue.

I've already looked at these threads for answers, which helped me in as much as writing my questions:

Any advice and answers to my questions is massively appreciated.

Best regards,

jdmac

Forgot to add, I'm also reading through the following in the definitive guide.

https://www.elastic.co/guide/en/elasticsearch/guide/current/_monitoring_individual_nodes.html

I may have actually answered my questions as to the current issue but any further input as a learning experience is very welcome. :slight_smile:

I think the reason for NODE3 being unavailable right now is the pausing by the JVM GC, as when I look at the logs it's running non-stop.

[2016-08-11 09:10:12,922][INFO ][monitor.jvm              ] [NODE3] [gc][old][370309][68673] duration [6.3s], collections [1]/[6.5s], total [6.3s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [51.8mb]->[51.6mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}
[2016-08-11 09:10:23,538][INFO ][monitor.jvm              ] [NODE3] [gc][old][370311][68675] duration [6.3s], collections [1]/[6.4s], total [6.3s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [57.9mb]->[54.1mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}
[2016-08-11 09:10:34,539][INFO ][monitor.jvm              ] [NODE3] [gc][old][370313][68677] duration [6.3s], collections [1]/[6.6s], total [6.3s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [52.3mb]->[52.6mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}
[2016-08-11 09:10:45,390][INFO ][monitor.jvm              ] [NODE3] [gc][old][370315][68679] duration [6.4s], collections [1]/[6.5s], total [6.4s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [51.3mb]->[52.8mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}
[2016-08-11 09:10:56,637][INFO ][monitor.jvm              ] [NODE3] [gc][old][370317][68681] duration [6.4s], collections [1]/[7s], total [6.4s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [49.5mb]->[55.4mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}
[2016-08-11 09:11:07,394][INFO ][monitor.jvm              ] [NODE3] [gc][old][370319][68683] duration [6.4s], collections [1]/[6.6s], total [6.4s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [53.7mb]->[61.8mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}
[2016-08-11 09:11:18,259][INFO ][monitor.jvm              ] [NODE3] [gc][old][370321][68685] duration [6.3s], collections [1]/[6.4s], total [6.3s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [60.2mb]->[52.2mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}
[2016-08-11 09:11:28,982][INFO ][monitor.jvm              ] [NODE3] [gc][old][370323][68687] duration [6.5s], collections [1]/[6.6s], total [6.5s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [52.4mb]->[53.9mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}
[2016-08-11 09:11:39,874][INFO ][monitor.jvm              ] [NODE3] [gc][old][370325][68689] duration [6.4s], collections [1]/[6.6s], total [6.4s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [50.6mb]->[56.3mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}
[2016-08-11 09:11:51,111][INFO ][monitor.jvm              ] [NODE3] [gc][old][370327][68691] duration [6.6s], collections [1]/[6.6s], total [6.6s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [66.5mb]->[52.9mb]/[66.5mb]}{[survivor] [5.3mb]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}
[2016-08-11 09:12:02,033][INFO ][monitor.jvm              ] [NODE3] [gc][old][370329][68693] duration [6.4s], collections [1]/[6.5s], total [6.4s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [53.4mb]->[56.7mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}
[2016-08-11 09:12:13,029][INFO ][monitor.jvm              ] [NODE3] [gc][old][370331][68695] duration [6.4s], collections [1]/[6.5s], total [6.4s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [62.4mb]->[59.3mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}
[2016-08-11 09:12:23,845][INFO ][monitor.jvm              ] [NODE3] [gc][old][370333][68697] duration [6.5s], collections [1]/[6.5s], total [6.5s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [66.5mb]->[56.3mb]/[66.5mb]}{[survivor] [6.8mb]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}
[2016-08-11 09:12:34,800][INFO ][monitor.jvm              ] [NODE3] [gc][old][370335][68699] duration [6.4s], collections [1]/[6.6s], total [6.4s]/[1.7d], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [59.3mb]->[56.2mb]/[66.5mb]}{[survivor] [0b]->[0b]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}

If I'm reading this right...
The old pool is not finding any 'dead' memory to GC and therefore constantly running GC in the hope it'll find something on next run, pausing the ElasticSearch operation on this node.

I'm guessing I need to provide more memory or an additional node to the cluster.

Hi,

The JVM monitor log traces vary from DEBUG to WARN depending of how long the garbage collection took. That's why you can see different tresholds between NODE1 and NODE3.

It looks like your nodes need more RAM. Are you doing heavy aggregations? What's the output of GET /_nodes/stats?filter_path=nodes.*.indices?

Indexing and aggregations can use a fair bit of CPU, so just having a single core per node is likely insufficient.

Hi Tanguy,

A few minutes ago I started to add more RAM to the cluster and increase the heap, starting with NODE3.

I think I started to answer my question, fortunately this cluster is actually all about getting familiar with ES ready for a more heavy production use, so a little breathing space thankfully.

I'm not aware of us doing heavy aggregations manually - Grafana/Kibana may be utilizing that though.

I think I should've spotted on the /_cat/nodes that the heap percent was too high.

I'm guessing since I've already started adding additional RAM, it may give out different readings to when the issue occurred on the output you asked?

Thanks,
jdmac

Hi Christian_Dahlqvist,

True, very fair point - it's something I'm bearing in mind.
This is just a proof of concept.

The production environment will be far more 'beefy'.

Many thanks,

jdmac

Kibana uses aggregations, so it would be useful to monitor CPU usage while you are querying to see if this is actually a limiting factor or not.

Cool, good to know. Thanks for the advice.

The issue now has been resolved... It was the heap stack use.

I suppose I should've read this sooner and got my head into the world of JVM.

Especially this part... :rolling_eyes:

The heap_used_percent metric is a useful number to keep an eye on. Elasticsearch is configured to initiate GCs when the heap reaches 75% full. If your node is consistently >= 75%, your node is experiencing memory pressure. This is a warning sign that slow GCs may be in your near future.

If the heap usage is consistently >=85%, you are in trouble. Heaps over 90–95% are in risk of horrible performance with long 10–30s GCs at best, and out-of-memory (OOM) exceptions at worst.

Hopefully this thread proves helpful to other new users of Elasticsearch - It's a fantastic product and for the most part very intuitive.

Glad you solved it!