Hi Elasticsearch Community,
One of the nodes in our cluster got restarted recently.
There is no info in the logs, just this. These are the first three lines for that day:
[2015-06-21 09:21:27,753][INFO ][node ] [sjc-elasticsearch-data03-si] version[1.3.7], pid[31096], build[3042293/2014-12-16T13:59:32Z] [2015-06-21 09:21:27,754][INFO ][node ] [sjc-elasticsearch-data03-si] initializing ... [2015-06-21 09:21:28,101][INFO ][plugins ] [sjc-elasticsearch-data03-si] loaded [action-updatebyquery, analysis-icu], sites [HQ, bigdesk, kopf]
It looks like the node was started at 09:21:27
by no info on why it
was stopped in the first place.
Other nodes report this:
[2015-06-21 09:02:53,599][DEBUG][action.admin.indices.stats] [sjc-elasticsearch-client01-si] [mdb-pod101-7][9], node[72pxkCzKQ324np0VIXUAkQ], [R], s[STARTED]: failed to executed [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@6c2238d3] org.elasticsearch.transport.NodeDisconnectedException: [sjc-elasticsearch-data03-si][inet[/10.255.1.213:9300]][indices/stats/s] disconnected
In the messages
log I see this
Jun 21 09:02:53 sjc-elasticsearch-data03 systemd: elasticsearch-sjc-elasticsearch-data03.service: main process exited, code=killed, status=6/ABRT Jun 21 09:02:53 sjc-elasticsearch-data03 systemd: Unit elasticsearch-sjc-elasticsearch-data03.service entered failed state.
Looks like the node was killed with SIGABRT.
We use puppet for automation, but there was no puppet run at that time.
There was no GC releated info in the logs, so I do not suspect heap issues (the node was running may many days before).
The load was standard at that time.
The OS was up whole time.
Java version:
[ikupczynski@sjc-elasticsearch-data03 ~]$ java -version java version "1.7.0_79" OpenJDK Runtime Environment (rhel-2.5.5.1.el7_1-x86_64 u79-b14) OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
ES version:
{ "status" : 200, "name" : "sjc-elasticsearch-client01-si", "version" : { "number" : "1.3.7", "build_hash" : "3042293e4b219dfb855a4e6c64241c530d1abeb0", "build_timestamp" : "2014-12-16T13:59:32Z", "build_snapshot" : false, "lucene_version" : "4.9" }, "tagline" : "You Know, for Search" }
This is a VM in google compute engine.
After the restart it works fine right now.
Can you advise me what may be the reason of it or how I can debug it further?
Thanks, Igor