Hello everyone,
I'm having strange issue with elasticsearch. Basically it's causing high server load occasionally on one of three nodes in a cluster. I mean, considering the amount of CPUs (2 physical, 16 logical) it's not much, but it certainly should not be at 2 when "idle" !
Cluster is consisted of following nodes:
node1 (currently master) - VPS Server, 4vCPU, 4GB RAM (2 allocated for heap), 100GB HDD
node2 - Physical Server, 2 Physical CPU's, 8 Logical, 16G RAM (4 allocated for heap), data stored on ZFS pool without compression (mounted on /var/lib/elasticsearch)
node3 - Physical Server, 2 Physical CPU's, 16 Logical, 12G RAM (4 allocated for heap), data stored on ZFS pool without compression (mounted on /var/lib/elasticsearch)
node3 is the one having issue. Like mentioned, it's only happening occasionally but it's always higher than the other nodes. Setup is quite similar between two physical machines don't know why is node3 having issue, even though it has better CPU (It's utilizing Xeon E5620).
At this moment load average is: 1.49, 1.11, 1.34.
Again, it's not much considering amount of CPUs but still...
Server isn't doing anything except data backups via rsync after midnight.
Last few lines of elasticsearch.log is
[2019-05-23T00:12:05,452][INFO ][o.e.m.j.JvmGcMonitorService] [es-backup11] [gc][38642] overhead, spent [309ms] collecting in the last [1s]
[2019-05-23T00:22:07,460][INFO ][o.e.m.j.JvmGcMonitorService] [es-backup11] [gc][39242] overhead, spent [283ms] collecting in the last [1s]
[2019-05-23T01:05:52,126][WARN ][o.e.i.f.SyncedFlushService] [es-backup11] [postfix-2019.05.22][1] can't to issue sync id [KUSSD9WXRRObhNa7XOtNrw] for out of sync replica [[postfix-2019.05.22][1], node[xTQsIj8CQsajIjcWMVeY4A], [R], s[STARTED], a[id=2V2d3MBwSHqWQaXkyE8o4w]] with num docs [5222]; num docs on primary [5223]
[2019-05-23T02:47:55,109][WARN ][o.e.i.f.SyncedFlushService] [es-backup11] [postfix-2019.05.23][2] can't to issue sync id [0R-RCWcGTLGrk6V-6i-naw] for out of sync replica [[postfix-2019.05.23][2], node[LpcI-a41QSW2uOMgyz2hDA], [R], s[STARTED], a[id=rZWN-plaRKaTa3A8eHUfLA]] with num docs [33]; num docs on primary [35]
[2019-05-23T03:57:17,454][WARN ][o.e.i.f.SyncedFlushService] [es-backup11] [postfix-2019.05.23][2] can't to issue sync id [AniftNijTZG3q_bw7FlQLw] for out of sync replica [[postfix-2019.05.23][2], node[LpcI-a41QSW2uOMgyz2hDA], [R], s[STARTED], a[id=rZWN-plaRKaTa3A8eHUfLA]] with num docs [75]; num docs on primary [76]
[2019-05-23T06:55:52,416][WARN ][o.e.i.f.SyncedFlushService] [es-backup11] [postfix-2019.05.23][2] can't to issue sync id [FOzcrf94SriQqFnDAQkLhQ] for out of sync replica [[postfix-2019.05.23][2], node[LpcI-a41QSW2uOMgyz2hDA], [R], s[STARTED], a[id=rZWN-plaRKaTa3A8eHUfLA]] with num docs [276]; num docs on primary [278]
gc.log:
2019-05-23T13:41:49.899+0200: 87261.863: Total time for which application threads were stopped: 0.0135308 seconds, Stopping threads took: 0.0000932 seconds
2019-05-23T13:41:49.900+0200: 87261.864: Total time for which application threads were stopped: 0.0008304 seconds, Stopping threads took: 0.0001691 seconds
2019-05-23T13:42:03.292+0200: 87275.256: Total time for which application threads were stopped: 0.0010224 seconds, Stopping threads took: 0.0001263 seconds
2019-05-23T13:42:11.529+0200: 87283.493: Total time for which application threads were stopped: 0.0010310 seconds, Stopping threads took: 0.0001883 seconds
2019-05-23T13:42:16.549+0200: 87288.513: [GC (Allocation Failure) 2019-05-23T13:42:16.549+0200: 87288.513: [ParNew
Desired survivor size 56688640 bytes, new threshold 6 (max 6)
- age 1: 9271408 bytes, 9271408 total
- age 2: 1391440 bytes, 10662848 total
- age 3: 506176 bytes, 11169024 total
- age 4: 362848 bytes, 11531872 total
- age 5: 49032 bytes, 11580904 total
- age 6: 11864 bytes, 11592768 total
: 903442K->16266K(996800K), 0.0164781 secs] 2304439K->1417360K(4083584K), 0.0167310 secs] [Times: user=0.17 sys=0.01, real=0.02 secs]
Number of shards in cluster 406, with 91 indicies // 11GB of data.
Any tweaking if available is suggested and if any additional info i need to provide, let me know.
Thanks in advance.