Cluster Reroute Retry Failed Null Pointer Exception

Bellicapelli0 · September 26, 2019, 10:01am

Good morning everyone.
Last night one of the machines with one of the nodes of my cluster shutdown, so this morning I restarted it and waited for shard allocation. After waiting for a while, the restarted node only had about 10 shards allocated, and over 300 where unassigned.
This was not the first time it happened, so I used this command in DevTools

POST /_cluster/reroute?retry_failed=true

which usually fixed this problem in the past, in this case though, the response was the following:

{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[elk_node-2][10.100.100.148:9300][cluster:admin/reroute]"
}
],
"type": "null_pointer_exception",
"reason": null
},
"status": 500
}

I started looking through the logs of the nodes and I found this in node 10.100.100.150

org.elasticsearch.transport.RemoteTransportException: [ta_elk_node-2][10.195.194.148:9300][cluster:admin/reroute]
Caused by: java.lang.NullPointerException
at org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDecider.getExpectedShardSize(DiskThresholdDecider.java:421) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator$Balancer.allocateUnassigned(BalancedShardsAllocator.java:847) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator$Balancer.access$000(BalancedShardsAllocator.java:232) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator.allocate(BalancedShardsAllocator.java:123) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:413) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:350) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.action.admin.cluster.reroute.TransportClusterRerouteAction$ClusterRerouteResponseAckedClusterStateUpdateTask.execute(TransportClusterRerouteAction.java:124) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:643) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:272) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:202) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:137) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:660) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) ~[elasticsearch-6.6.1.jar:6.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

I assumed it was a problem of disk space, since we are currently a bit tight on space, so I deleted some old indexes. This allowed the node that died overnight to assign another 5 shards, bringing the total to 15, while the other nodes both have over 700, and about 300 still need to be assigned.
Just to be clear, the node that died overnight and the node where I find the nullpointerexception are two different nodes.
I am at a loss about what to do, there seems to be no information online about this type of problem, or at least I was unable to find them

system · October 24, 2019, 10:01am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch reroute move command: remote_transport_exception Error Elasticsearch	2	804	June 28, 2019
NullPointerException and unresponsive cluster afterwards Elasticsearch	7	429	July 6, 2017
RecoveryFailedException while adding a node during Elasticsearch upgrade from 0.2.0 to 1.2.1 Elasticsearch	2	652	July 6, 2017
NullPointerException in scroll Elasticsearch	10	2239	October 5, 2017
clusterService#updateTask-pool-11-thread-3" java.lang.NullPointerException Elasticsearch	2	426	July 6, 2017

Cluster Reroute Retry Failed Null Pointer Exception

Related topics