We have a recurring error in our test environment, where elasticsearch fails to boot up. We have logs from two occurrence, but sadly we can't reproduce the error consistently to help determine what the problem might be. It is a single node setup.
I will post the logs of the errors in separate posts under this topic, because of character limit.
The two errors are not the same, however the causes of them look very similar (incoming term X does not match current term Y). I looked online, to no avail (I only found one relevant post, which didn't help). I also looked at the code, trying to figure out what can cause such an error, and it looked to me that the mismatch of these - incoming and current - terms would mean a kind of desynchronization between nodes, but this is a single node setup, so maybe I misunderstood something.
What could be the problem, and how can we mitigate it? Please ask if there's any other information (besides the logs below) is needed to tell what's going on here. Thank you.
[2020-10-21T00:58:24,874][INFO ][o.e.t.TransportService ] [scb1] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2020-10-21T00:58:26,873][INFO ][o.e.c.c.Coordinator ] [scb1] cluster UUID [BJHT2fLHQCynGCURVTaEIQ]
[2020-10-21T00:58:29,264][INFO ][o.e.c.c.JoinHelper ] [scb1] failed to join {scb1}{JU_QmoZER9yA8MDKIxklEA}{ORaU5I2tTJWM3KlyEItHwg}{127.0.0.1}{127.0.0.1:9300}{dimr} with JoinRequest{sourceNode={scb1}{JU_QmoZER9yA8MDKIxklEA}{ORaU5I2tTJWM3KlyEItHwg}{127.0.0.1}{127.0.0.1:9300}{dimr}, minimumTerm=2, optionalJoin=Optional[Join{term=3, lastAcceptedTerm=2, lastAcceptedVersion=17, sourceNode={scb1}{JU_QmoZER9yA8MDKIxklEA}{ORaU5I2tTJWM3KlyEItHwg}{127.0.0.1}{127.0.0.1:9300}{dimr}, targetNode={scb1}{JU_QmoZER9yA8MDKIxklEA}{ORaU5I2tTJWM3KlyEItHwg}{127.0.0.1}{127.0.0.1:9300}{dimr}}]}
org.elasticsearch.transport.RemoteTransportException: [scb1][127.0.0.1:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: incoming term 3 does not match current term 5
at org.elasticsearch.cluster.coordination.CoordinationState.handleJoin(CoordinationState.java:225) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.coordination.Coordinator.handleJoin(Coordinator.java:982) ~[elasticsearch-7.7.0.jar:7.7.0]
at java.util.Optional.ifPresent(Optional.java:159) ~[?:1.8.0_265]
at org.elasticsearch.cluster.coordination.Coordinator.processJoinRequest(Coordinator.java:527) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.coordination.Coordinator.lambda$handleJoinRequest$7(Coordinator.java:491) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:372) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:359) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.coordination.Coordinator.handleJoinRequest(Coordinator.java:478) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$0(JoinHelper.java:129) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) [elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:767) [elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.7.0.jar:7.7.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_265]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_265]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265]
[2020-10-21T00:58:29,264][INFO ][o.e.c.c.JoinHelper ] [scb1] failed to join {scb1}{JU_QmoZER9yA8MDKIxklEA}{ORaU5I2tTJWM3KlyEItHwg}{127.0.0.1}{127.0.0.1:9300}{dimr} with JoinRequest{sourceNode={scb1}{JU_QmoZER9yA8MDKIxklEA}{ORaU5I2tTJWM3KlyEItHwg}{127.0.0.1}{127.0.0.1:9300}{dimr}, minimumTerm=3, optionalJoin=Optional[Join{term=4, lastAcceptedTerm=2, lastAcceptedVersion=17, sourceNode={scb1}{JU_QmoZER9yA8MDKIxklEA}{ORaU5I2tTJWM3KlyEItHwg}{127.0.0.1}{127.0.0.1:9300}{dimr}, targetNode={scb1}{JU_QmoZER9yA8MDKIxklEA}{ORaU5I2tTJWM3KlyEItHwg}{127.0.0.1}{127.0.0.1:9300}{dimr}}]}
org.elasticsearch.transport.RemoteTransportException: [scb1][127.0.0.1:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: incoming term 4 does not match current term 5
at org.elasticsearch.cluster.coordination.CoordinationState.handleJoin(CoordinationState.java:225) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.coordination.Coordinator.handleJoin(Coordinator.java:982) ~[elasticsearch-7.7.0.jar:7.7.0]
at java.util.Optional.ifPresent(Optional.java:159) ~[?:1.8.0_265]
at org.elasticsearch.cluster.coordination.Coordinator.processJoinRequest(Coordinator.java:527) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.coordination.Coordinator.lambda$handleJoinRequest$7(Coordinator.java:491) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:372) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:359) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.coordination.Coordinator.handleJoinRequest(Coordinator.java:478) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$0(JoinHelper.java:129) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) [elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:767) [elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.7.0.jar:7.7.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_265]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_265]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265]
[2020-10-21T00:58:29,810][INFO ][o.e.c.s.MasterService ] [scb1] elected-as-master ([1] nodes joined)[{scb1}{JU_QmoZER9yA8MDKIxklEA}{ORaU5I2tTJWM3KlyEItHwg}{127.0.0.1}{127.0.0.1:9300}{dimr} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 6, version: 18, delta: master node changed {previous [], current [{scb1}{JU_QmoZER9yA8MDKIxklEA}{ORaU5I2tTJWM3KlyEItHwg}{127.0.0.1}{127.0.0.1:9300}{dimr}]}
[2020-10-21T00:58:30,907][INFO ][o.e.c.s.ClusterApplierService] [scb1] master node changed {previous [], current [{scb1}{JU_QmoZER9yA8MDKIxklEA}{ORaU5I2tTJWM3KlyEItHwg}{127.0.0.1}{127.0.0.1:9300}{dimr}]}, term: 6, version: 18, reason: Publication{term=6, version=18}
2020-11-09T02:06:31,944][INFO ][o.e.n.Node ] [scb2] stopping ...
[2020-11-09T02:06:32,188][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [scb2] failed to execute on node [ZYiGohE7QBikNcnuPV-TWA]
org.elasticsearch.transport.SendRequestTransportException: [scb2][127.0.0.1:9300][cluster:monitor/nodes/stats[n]]
at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:719) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:621) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:603) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.start(TransportNodesAction.java:182) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:82) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:51) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:153) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:129) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:64) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:83) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:72) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:399) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.client.support.AbstractClient$ClusterAdmin.execute(AbstractClient.java:688) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.client.support.AbstractClient$ClusterAdmin.nodesStats(AbstractClient.java:784) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.InternalClusterInfoService.updateNodeStats(InternalClusterInfoService.java:249) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.InternalClusterInfoService.refresh(InternalClusterInfoService.java:291) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.InternalClusterInfoService.maybeRefresh(InternalClusterInfoService.java:271) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.InternalClusterInfoService.access$200(InternalClusterInfoService.java:68) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.InternalClusterInfoService$SubmitReschedulingClusterInfoUpdatedJob.lambda$run$0(InternalClusterInfoService.java:221) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) ~[elasticsearch-7.7.0.jar:7.7.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_272]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_272]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_272]
Caused by: org.elasticsearch.node.NodeClosedException: node closed {scb2}{ZYiGohE7QBikNcnuPV-TWA}{netSmENiRS-UYLYgxU4hEQ}{127.0.0.1}{127.0.0.1:9300}{dimr}
at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:701) ~[elasticsearch-7.7.0.jar:7.7.0]
... 22 more
[2020-11-09T02:06:32,191][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [scb2] failed to execute [indices:monitor/stats] on node [ZYiGohE7QBikNcnuPV-TWA]
org.elasticsearch.transport.SendRequestTransportException: [scb2][127.0.0.1:9300][indices:monitor/stats[n]]
at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:719) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:621) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:588) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.sendNodeRequest(TransportBroadcastByNodeAction.java:313) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.start(TransportBroadcastByNodeAction.java:302) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction.doExecute(TransportBroadcastByNodeAction.java:225) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction.doExecute(TransportBroadcastByNodeAction.java:76) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:153) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:129) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:64) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:83) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:72) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:399) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.client.support.AbstractClient$IndicesAdmin.execute(AbstractClient.java:1234) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.client.support.AbstractClient$IndicesAdmin.stats(AbstractClient.java:1550) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.InternalClusterInfoService.updateIndicesStats(InternalClusterInfoService.java:264) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.InternalClusterInfoService.refresh(InternalClusterInfoService.java:321) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.InternalClusterInfoService.maybeRefresh(InternalClusterInfoService.java:271) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.InternalClusterInfoService.access$200(InternalClusterInfoService.java:68) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cluster.InternalClusterInfoService$SubmitReschedulingClusterInfoUpdatedJob.lambda$run$0(InternalClusterInfoService.java:221) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) ~[elasticsearch-7.7.0.jar:7.7.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_272]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_272]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_272]
Caused by: org.elasticsearch.node.NodeClosedException: node closed {scb2}{ZYiGohE7QBikNcnuPV-TWA}{netSmENiRS-UYLYgxU4hEQ}{127.0.0.1}{127.0.0.1:9300}{dimr}
at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:701) ~[elasticsearch-7.7.0.jar:7.7.0]
... 23 more
[2020-11-09T02:06:32,236][INFO ][o.e.n.Node ] [scb2] stopped
[2020-11-09T02:06:32,236][INFO ][o.e.n.Node ] [scb2] closing ...
[2020-11-09T02:06:32,311][INFO ][o.e.n.Node ] [scb2] closed
They aren't unhandled either, they're logged (at INFO and DEBUG respectively) as part of the handling process. Unhandled exceptions would immediately terminate the whole node.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.