Cluster health not reported accurately in ECE

I have clusters where the health is degraded by indices in read only state. Clicking through to the individual deployment UI for those clusters, the health is often not reportedly the same state.
Within ECE deployments top level page and within specific deployment UIs separately, the health is consistent, so I don't think the status is fluctuating. ECE top page seems to have a snapshot from some time in the past that bears little relation to the current status in Kibana.
Clicking through to Kibana, one cluster that is Unhealthy at the top level and is Healthy at the individual level, has lifecycle errors against several indices.

As context, the readonly state was probably induced by running out of storage. This has been addressed by deleting excessive uncompressed log files. The cluster in question has had:
PUT _all/_settings
{
"index": {
"blocks": {
"read_only_allow_delete": "false"
}
}
}
to clear the readonly status. I believe there are no longer readonly indices on this cluster.

What is going on with cluster health reporting?

What is the criteria that leads you to believe that there are no longer readonly indices on the cluster, did you check via API?

(I think we get the info we use from _cluster/state)

Alex,

thanks for the reply, after freeing up space I used the ECE UI to run the PUT request above, and eventually the top level UI showed cluster health as good in line with the detail pages below. I got the feeling there was some caching effect on the top level page as it was not updating in a timely way.

I also noticed that clusters that had been terminated then deleted were still showing at the top level, but on clicking through there would be an error message.

Perhaps if the ECE UI relies on the clusters that are built by default, they also are compromised when the disk thresholds are pushed.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.