@liorg2 - Do you have an active support contract with us? I'm thinking it would be more helpful if you could send in a support diagnostic bundle instead of this back-and-forth
I wonder if there is a problem with communications between nodes where the master node can communicate with all the other nodes in the cluster (hence _cat/indices shows no problems) but some of the other nodes cannot communicate with each other.
You talk about availability zones. Are you running in Elastic Cloud? Or have you set up your own cluster in the cloud?
One more thing you could try is to get the allocation explanations for the problematic indices - maybe that will show up something interesting:
GET /_cluster/allocation/explain?include_yes_decisions=true
{
"index": ".ml-config",
"shard": 0,
"primary": true
}
GET /_cluster/allocation/explain?include_yes_decisions=true
{
"index": ".ml-config",
"shard": 0,
"primary": false
}
GET /_cluster/allocation/explain?include_yes_decisions=true
{
"index": ".ml-anomalies-shared",
"shard": 0,
"primary": true
}
GET /_cluster/allocation/explain?include_yes_decisions=true
{
"index": ".ml-anomalies-shared",
"shard": 0,
"primary": false
}
One more problem I can see from your _cat/indices output is that you've ended up with a concrete index that should have been an alias. .ml-anomalies-.write-test_93 should have been an alias. We have recently added protection to stop this problem occurring but since you only have one document in that index it would be interesting to know what it is. Please could you do a search of .ml-anomalies-.write-test_93 and let us know what document is in it? You can redact anything confidential - just seeing the field names of the document that's in there will help us understand why it got created. Then you can delete that index and, if the test_93 job still exists, create .ml-anomalies-.write-test_93 as an alias of .ml-anomalies-shared. (If the test_93 job no longer exists then just delete that index.)
It looks like you're a customer of Elastic Cloud and with that comes support. And, if you're experiencing inter-node communications problems, this is exactly the kind of thing support should help you with.
If you don't know how to access support, DM me the email address you use for your cloud account and we'll get someone to help you.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.