Elasticsearch 5.6 very quickly increasing direct buffer pools


(Krzysztof Szarlej) #1

Hi all,

Elastic invo:

version[5.6.4], pid[1], build[8bbedf5/2017-10-31T18:55:38.105Z], OS[Linux/4.4.0-11
2-generic/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_151/25.151-b12]

JVM arguments

-Xms5235M, -Xmx5235M, -Des.path.home=/usr/share/elasticsearch

I have a problem with elasticsearch cluster master nodes.

Currently the cluster is running 6x data nodes 3x master and 2x search. Master nodes are on 7.5GB VMs with only Elastic java process running on them. Cluster state size on disk is 22mb. As for data - mostly logs. Only bunch of templates, rather small. Each data node holds approx 600gigs.

Master nodes are occasionally running out of memory. That seemed strange for me because at the moment when master ran out of memory its heap space was only 30% full. What I noticed is big amount of "Direct buffer pools". At the moment the memory exhaustion happened there were 319 buffer pools which were 5gigabytes in size.

Heap usage when box ran out of mem (21:20)

Direct buffer pools when box ran out of mem (21:20)

As a result the process RSS size was way above 90% of available memory and it started a long (few minutes long) running GC and the new master was elected.

To remedy the memory issues I set:

-Djdk.nio.maxCachedBufferSize=262144 -XX:MaxDirectMemorySize=1500M

This has fixed the problem of memory shortages but now the old gen gc is happening pretty often (every ~4-5 minutes) and the heap never really goes beyond 20%. The buffer pools are cleaned when old gen gc runs so I guess that the GC is itself triggered by the fact that direct memory size went above XX:MaxDirectMemorySize. I am not sure if specifically old gen gc is cleaning direct buffer pools but on the charts I can clearly see that the size of direct buffer pools goes down whenever gc happens:

Each annotation is old gen gc, the chart represents buffer pools.
57

So now the direct buffer pools are raising at the rate of about ~250-300mb per minute which is pretty big and they trigger GC every few minutes. Is that OK? Why does the direct buffer pools increase in size so quickly on master nodes?

Best regards,
Krzysztof!


(Mark Walkom) #2

If you are getting OOMs, then something should be logged, can you show that?


(Krzysztof Szarlej) #3

Thanks for reply,

I am digging through logs and this doesnt seem like OOM kills. The machine runs very low on memory, and gc starts to take like 5 minutes to complete. During that time the cluster switches the master to new one. So these were not OOM kills from OS OOM killer, my original post might be misleading, sorry. Here are some of the the log messages that are preceding the memory exhaustion:

[2018-02-12T19:37:25,430][WARN ][o.e.t.n.Netty4Transport ] [esmaster2-production] write and flush on the network layer failed (channel: [id: 0x093eba2b, L:0.0.0.0/0

[2018-02-12T19:37:24,381][WARN ][o.e.c.s.ClusterService ] [esmaster2-production] failing [zen-disco-node-join[{esdata3-production}{3rRfooL8TuSExOefudB2tA}{7BWpkBrkQS6xKjZoTGVj4g}{172.20.0.44}{172.20.0.44:9300}]]: failed to commit cluster state version [94166]

[2018-02-12T19:37:20,769][WARN ][o.e.t.n.Netty4Transport ] [esmaster2-production] write and flush on the network layer failed (channel: [id: 0x490f3284, L:/172.20.0
.53:9300 ! R:/172.20.0.55:51972])
java.io.IOException: Broken pipe

and then this long GC

[2018-02-12T19:45:25,123][WARN ][o.e.m.j.JvmGcMonitorService] [esmaster2-production] [gc][young][6216][15] duration [4.6m], collections [1]/[6.3m], total [4.6m]/[5.8
m], memory [1.8gb]->[333.9mb]/[4.9gb], all_pools {[young] [1.4gb]->[19.6mb]/[1.4gb]}{[survivor] [89.4mb]->[91mb]/[91.5mb]}{[old] [225.5mb]->[225.5mb]/[3.4gb]}
[2018-02-12T19:45:25,123][WARN ][o.e.m.j.JvmGcMonitorService] [esmaster2-production] [gc][6216] overhead, spent [4.6m] collecting in the last [6.3m]

Btw you can check original message. I edited it a bit with more information before I noticed you have answered.


(Mark Walkom) #4

Sorry, I thought I read OOM there!

That's not ideal!

How many shards, nodes, indices, GBs of data?


(Krzysztof Szarlej) #5

@warkolm

This long gc times are happening only during these memory shortages caused by excessive amount of direct memory buffers. Could you maybe explain what may cause elasticsearch master to fill the direct memory buffers that fast? (250-300mb per minute of direct memory buffers allocation).

Stats:

  • Each data node(we have 6 of them) has between 600-700 gb of data
  • Overall there is 2034 primary indices and each has 1 replica so 4068.
  • Majority of indices (older ones) are mainly single sharded. Most recent (last month) indices are 4 shards per index. Indices are daily, some are monthly but they are small (several hundred mb's per month). Daily indice size vary between 1gb to ~20gb.
  • Overall 6.75 billion documents at that moment. The current growth is about 1.5 million documents per hour.
  • The cluster is primarily a write oriented. There are very few aggregation/filter queries (thats why we have lowered the fielddata cache size to 20% heap).
  • In templates we use not_analyzed strings only.

As far as I understand - we have no ingestion pipelines at all in our cluster so none of the boxes has role 'i' role. Master nodes are only master nodes, they dont hold data, they are not coordinator nodes.

The performance issues are only on master nodes. Direct memory buffers on data nodes is few hundred megabytes and stable. Here is chart of the direct memory buffers on master node during the night:

Each red annotation is GC old

EDIT:
So far I understand that:

But why does the buffers size raise so quickly? Unfortunately I dont have any other cluster of this size to compare


(Krzysztof Szarlej) #6

@warkolm

I found out the root issue. The quick raising of direct memory buffers on master is caused by kibana. When I turn off kibana then the buffer pools stops raising that fast. I cannot find what exactly in kibana causes it.

In tcp dump on elasticsearch server I see thousands of "empty" packets from kibana

22:55:31.651839 IP essearch1-staging.42057 > ip-192-168-31-223.eu-west-1.compute.internal.9300: Flags [.], ack 10443091, win 12315, options [nop,nop,TS val 635171125 ecr 1515861], length 0
E..4.X@.@.....!...!..I$T....P.m...0........
%..5..!U

There is thousands of them. When I turn off kibana then these messages stops appearing.


(Mark Walkom) #7

It might be worth raising an issue, with as much detail as possible, on the Kibana repo and see what they think?


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.