My cluster stopped automatically, what's the problem?


(chenjinyuan87) #1

Hi all,
Is there any problem causes the cluster shutdown automatically?
All the ten nodes shutdown this morning.
We found this problem in 16th, and today it appears again.

The logs i can found around shutdown including:

The first node:
[There is no log befor this one today]
[2017-04-19T05:39:45,428][INFO ][o.e.n.Node ] [index188] stopping
...
[2017-04-19T05:39:49,789][DEBUG][o.e.a.a.i.f.TransportForceMergeAction] [index18
8] [indices:admin/forcemerge] failed to execute operation for shard [[tj_news_re
cent][2], node[5at14jo6Qn-GVe4nGw_jsw], [R], s[STARTED], a[id=38eb4B-iRmWARSZmOQ
inkA]]
java.io.IOException: background merge hit exception: _5pej8(6.2.0):C2833847/2679
34:delGen=13 _5qwoc(6.2.0):c64302/2:delGen=1 _5q4vr(6.2.0):c22749/115:delGen=4 i
nto _5qwp4 [maxNumSegments=1] [ABORTED]
This log sames caused by the stopping request. After that, the node stopped.

The second:
Initially, some events about nodes left. (Since the first node is stopped)
[2017-04-19T06:10:11,774][INFO ][o.e.n.Node ] [index187] stopping
...
[2017-04-19T06:10:11,929][DEBUG][o.e.a.a.i.f.TransportForceMergeAction] [index18
7] failed to execute [indices:admin/forcemerge] on node [4jXUkkeERDCFf09LADlHBg]
org.elasticsearch.transport.TransportException: transport stopped, action: indic
es:admin/forcemerge[n]...
[2017-04-19T06:10:11,929][ERROR][n.u.c.D.rejectedExecution] Failed to submit a l
istener notification task. Event loop shut down?
java.util.concurrent.RejectedExecutionException: event executor terminated

followed with some similar logs, and finally stopped.

The third:
[2017-04-19T01:20:19,079][INFO ][o.e.m.j.JvmGcMonitorService] [index185] [gc][19
764] overhead, spent [473ms] collecting in the last [1s]
[2017-04-19T02:47:13,759][INFO ][o.e.m.j.JvmGcMonitorService] [index185] [gc][24
975] overhead, spent [251ms] collecting in the last [1s]
...Some node disconnected message
[2017-04-19T06:10:08,317][INFO ][o.e.n.Node ] [index185] stopping
...
[2017-04-19T06:10:10,472][ERROR][n.u.c.D.rejectedExecution] Failed to submit a l
istener notification task. Event loop shut down?
Similar logs with the previous one, and stopped.

Anyway i didn't find any useful information from the log.


(Mark Walkom) #2

What version?


(chenjinyuan87) #3

5.0.0.
BTW: is it worth to update from 5.0.0 to the current version?


(Mark Walkom) #4

It's always worth updating :slight_smile:

However in this case the only way for this to happen is for something to stop the service, there are no APIs in ES 5.X that can shutdown the nodes. You should look in your OS logs to see if there is anything that correlates.


(chenjinyuan87) #5

Thank you. That's also what i assumed. And i added a audit to log all the kill commands this morning.
Hope it can help me...


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.