Getting the message Failed to update shard information for ClusterInfoUpdateJob in the es master logs

We are using elasticsearch 7.8.0 on centos 7. There are three data pods in the cluster we are restarting one of the data pod. Suddenly, elasticsearch curl to service stopped working and the following error observes in the elasticsearch elected master pod logs.

{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:00:00.654Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update shard information for ClusterInfoUpdateJob within 15s timeout"}}
{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:01:00.656Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update shard information for ClusterInfoUpdateJob within 15s timeout"}}
{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:02:00.658Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update shard information for ClusterInfoUpdateJob within 15s timeout"}}
{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:02:45.659Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update node information for ClusterInfoUpdateJob within 15s timeout"}}
{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:03:00.660Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update shard information for ClusterInfoUpdateJob within 15s timeout"}}
{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:04:00.662Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update shard information for ClusterInfoUpdateJob within 15s timeout"}}
{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:05:00.664Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update shard information for ClusterInfoUpdateJob within 15s timeout"}}
{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:06:00.666Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update shard information for ClusterInfoUpdateJob within 15s timeout"}}

some of the rest commands which are working
curl _cat/health
curl _nodes/hot_threads?threads=9999

some of the rest commands which are not working
curl _cat/indices
curl _cat/allocation
curl _cat/nodes

Following is the some output of _nodes/hot_threads?threads=9999

::: {abc-elasticsearch-client-768fd6bcc6-68zgd}{cfIWxQEjS_qdtX9e-Z4mxw}{GoUBp8MNS8CLACv7oYItPg}{aa.aa.aa.aa}{aa.aa.aa.aa:9300}{i}
Hot threads at 2021-01-15T11:01:09.333Z, interval=500ms, busiestThreads=9999, ignoreIdleThreads=true:
0.0% (24.3micros out of 500ms) cpu usage by thread 'signals/_main_Worker-1'
10/10 snapshots sharing following 2 elements
java.base@11.0.9/java.lang.Object.wait(Native Method)
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)
0.0% (24.1micros out of 500ms) cpu usage by thread 'signals/_main_Worker-2'
10/10 snapshots sharing following 2 elements
java.base@11.0.9/java.lang.Object.wait(Native Method)
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)
0.0% (13.6micros out of 500ms) cpu usage by thread 'signals/_main_Worker-3'
10/10 snapshots sharing following 2 elements
java.base@11.0.9/java.lang.Object.wait(Native Method)
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)
0.0% (0s out of 500ms) cpu usage by thread 'elasticsearch[keepAlive/7.8.0]'
10/10 snapshots sharing following 8 elements
java.base@11.0.9/jdk.internal.misc.Unsafe.park(Native Method)
java.base@11.0.9/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
java.base@11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
java.base@11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
java.base@11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
java.base@11.0.9/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
app//org.elasticsearch.bootstrap.Bootstrap$1.run(Bootstrap.java:89)
java.base@11.0.9/java.lang.Thread.run(Thread.java:834)
0.0% (0s out of 500ms) cpu usage by thread 'Common-Cleaner'
10/10 snapshots sharing following 5 elements
java.base@11.0.9/java.lang.Object.wait(Native Method)
java.base@11.0.9/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)
java.base@11.0.9/jdk.internal.ref.CleanerImpl.run(CleanerImpl.java:148)
java.base@11.0.9/java.lang.Thread.run(Thread.java:834)
java.base@11.0.9/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:134)

Could someone please suggest how to resolve this issue?
Thanks,

Can you share the full hot threads from the master node please?

please refer the below file for the output of hot_threads

Thanks,
Prashant

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.