Getting the message Failed to update shard information for ClusterInfoUpdateJob in the es master logs

prashant1 · January 15, 2021, 2:46pm

We are using elasticsearch 7.8.0 on centos 7. There are three data pods in the cluster we are restarting one of the data pod. Suddenly, elasticsearch curl to service stopped working and the following error observes in the elasticsearch elected master pod logs.

{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:00:00.654Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update shard information for ClusterInfoUpdateJob within 15s timeout"}}
{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:01:00.656Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update shard information for ClusterInfoUpdateJob within 15s timeout"}}
{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:02:00.658Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update shard information for ClusterInfoUpdateJob within 15s timeout"}}
{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:02:45.659Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update node information for ClusterInfoUpdateJob within 15s timeout"}}
{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:03:00.660Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update shard information for ClusterInfoUpdateJob within 15s timeout"}}
{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:04:00.662Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update shard information for ClusterInfoUpdateJob within 15s timeout"}}
{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:05:00.664Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update shard information for ClusterInfoUpdateJob within 15s timeout"}}
{"type":"log","host":"abc-elasticsearch-master-0","level":"WARN","system":"abc","time": "2021-01-15T08:06:00.666Z","logger":"o.e.c.InternalClusterInfoService","timezone":"UTC","marker":"[abc-elasticsearch-master-0] ","log":{"message":"Failed to update shard information for ClusterInfoUpdateJob within 15s timeout"}}

some of the rest commands which are working
curl _cat/health
curl _nodes/hot_threads?threads=9999

some of the rest commands which are not working
curl _cat/indices
curl _cat/allocation
curl _cat/nodes

Following is the some output of _nodes/hot_threads?threads=9999

::: {abc-elasticsearch-client-768fd6bcc6-68zgd}{cfIWxQEjS_qdtX9e-Z4mxw}{GoUBp8MNS8CLACv7oYItPg}{aa.aa.aa.aa}{aa.aa.aa.aa:9300}{i}
Hot threads at 2021-01-15T11:01:09.333Z, interval=500ms, busiestThreads=9999, ignoreIdleThreads=true:
0.0% (24.3micros out of 500ms) cpu usage by thread 'signals/_main_Worker-1'
10/10 snapshots sharing following 2 elements
java.base@11.0.9/java.lang.Object.wait(Native Method)
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)
0.0% (24.1micros out of 500ms) cpu usage by thread 'signals/_main_Worker-2'
10/10 snapshots sharing following 2 elements
java.base@11.0.9/java.lang.Object.wait(Native Method)
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)
0.0% (13.6micros out of 500ms) cpu usage by thread 'signals/_main_Worker-3'
10/10 snapshots sharing following 2 elements
java.base@11.0.9/java.lang.Object.wait(Native Method)
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)
0.0% (0s out of 500ms) cpu usage by thread 'elasticsearch[keepAlive/7.8.0]'
10/10 snapshots sharing following 8 elements
java.base@11.0.9/jdk.internal.misc.Unsafe.park(Native Method)
java.base@11.0.9/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
java.base@11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
java.base@11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
java.base@11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
java.base@11.0.9/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
app//org.elasticsearch.bootstrap.Bootstrap$1.run(Bootstrap.java:89)
java.base@11.0.9/java.lang.Thread.run(Thread.java:834)
0.0% (0s out of 500ms) cpu usage by thread 'Common-Cleaner'
10/10 snapshots sharing following 5 elements
java.base@11.0.9/java.lang.Object.wait(Native Method)
java.base@11.0.9/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)
java.base@11.0.9/jdk.internal.ref.CleanerImpl.run(CleanerImpl.java:148)
java.base@11.0.9/java.lang.Thread.run(Thread.java:834)
java.base@11.0.9/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:134)

Could someone please suggest how to resolve this issue?
Thanks,

warkolm · January 17, 2021, 10:46pm

Can you share the full hot threads from the master node please?

prashant1 · January 18, 2021, 7:33am

please refer the below file for the output of hot_threads

gist.github.com

https://gist.github.com/prashant407/1399733b7a3f5d26f25da0faeb0377eb

gistfile1.txt


::: {abc-elasticsearch-client-768fd6bcc6-68zgd}{cfIWxQEjS_qdtX9e-Z4mxw}{GoUBp8MNS8CLACv7oYItPg}{aa.aa.aa.aa}{aa.aa.aa.aa:9300}{i}
   Hot threads at 2021-01-15T11:01:09.333Z, interval=500ms, busiestThreads=9999, ignoreIdleThreads=true:

    0.0% (24.3micros out of 500ms) cpu usage by thread 'signals/_main_Worker-1'
     10/10 snapshots sharing following 2 elements
       java.base@11.0.9/java.lang.Object.wait(Native Method)
       org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)

    0.0% (24.1micros out of 500ms) cpu usage by thread 'signals/_main_Worker-2'

This file has been truncated. show original

Thanks,
Prashant

system · February 15, 2021, 7:33am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch WARN : Failed to update shard information for ClusterInfoUpdateJob within 15s timeout Elasticsearch	1	1209	March 23, 2021
Failed to update shard information for ClusterInfoUpdateJob Elasticsearch docker	1	2031	March 25, 2020
Elasticsearch stop working properly Elasticsearch	2	361	September 28, 2020
Strange logs after update to ES7 Elasticsearch	1	477	May 23, 2019
Data node removed; master_failed Elasticsearch	7	1583	July 4, 2017

Getting the message Failed to update shard information for ClusterInfoUpdateJob in the es master logs

Related topics