Cluster Red after system patching forced upgrade 7.16.1 -> 7.17.1

We have and 8 node cluster where the nodes are all running Centos 7. When they were rebooted for patching, the ES version went to 7.17.1. The master node is periodically scrolling these errors:

[2022-03-24T10:59:50,530][WARN ][o.e.x.m.MonitoringService] [es02] monitoring execution failed
org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulks
.....
Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: bulk [default_local] reports failures when exporting documents
  at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.throwExportException(LocalBulk.java:131) ~[?:?]
  ... 23 more

Kibana will not start so I'm kind of blind. ES-head plugin shows cluster health: red (1538 of 3101), and the 1538 is not increasing.

I have attempted to re-enable routing with:

curl -H 'Content-Type: application/json' -XPUT 'xx.yy.zz:9200/_cluster/settings' -d '{ 
"transient": 
    { "cluster.routing.allocation.enable" : "all" 
    }
}'

and

curl -H 'Content-Type: application/json' -XPUT 'xx.yy.zz:9200/_cluster/settings' -d '{ 
"transient": 
    { "cluster.routing.rebalance.enable" : "all" 
    }
}'

both of which are acknowledged. I also tried:

curl --noproxy '*' -X PUT "xxx.yyy.zzz:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": "all"
  }
}'

which also is acknowledged but seems to make no difference.

The index referred to in the error - UnavailableShardsException[[.monitoring-es-7-2022.03.24][0] primary shard is not active - shows as one un-allocated shard.

I would be very grateful for any suggestions to get shards moving again and get out of the Red status.

In case anyone faces a similar situation, the problem was due to apparently improperly formatted "cluster.routing.allocation.include._host" host names, which were not fully qualified domain names. I cleared the filter using this:

curl -H 'Content-Type: application/json' -XPUT 'xxx.yyy.zzz:9200/_cluster/settings' -d '
{
  "transient" : {
    "cluster.routing.allocation.include._host" : null
  }
}'

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.