Error while delete ml anomaly job

I'm trying to delete an ML job.

I used Kibana to do so, and Kibana shows "deleting" status for this job.

DELETE _ml/anomaly_detectors/jobid?force=true

gives:

"reason" : {
"type" : "node_disconnected_exception",
"reason" : "[instance-0000000057][10.x.x.x:19998][indices:data/read/search[phase/query]] disconnected"
}

(node is up and running)

POST _ml/anomaly_detectors/jobid/_close?force=true

gives:

No known job with id

Any assistance will be appreciated

What does delete without force return?

DELETE _ml/anomaly_detectors/jobid

How about just getting the job?

GET _ml/anomaly_detectors/jobid

Which version of Elasticsearch are you running?

Hi

Delete without force returns the same

GET...
returns

{
  "error" : {
    "root_cause" : [ ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [ ]
  },
  "status" : 503
}

It sounds like you might have lost your .ml-config index, or the node that index is on is inaccessible.

Which version of Elasticsearch are you using?

Does this show anything wrong?

GET _cat/indices/.ml*?v

version 7.8.1

@liorg2 - Do you have an active support contract with us? I'm thinking it would be more helpful if you could send in a support diagnostic bundle instead of this back-and-forth

unfortunately not...
tried a restart and still the same

"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : ".ml-anomalies-shared",
"node" : "x4BPgr-OTuSr6CycZY2SuQ",
"reason" : {
"type" : "node_disconnected_exception",
"reason" : "[instance-..........][10........:19662][indices:data/read/search[phase/query]] disconnected"
}
}
],

now new error

"reason" : "No shard available for [get [.ml-config][_doc][anomaly_detector-..........-job]: routing [null]]"

I assume this is your answer to my question about having a support contract with us?

Yes.
Btw what i did today is changing the ml node from 1 az into 2.

I wonder if there is a problem with communications between nodes where the master node can communicate with all the other nodes in the cluster (hence _cat/indices shows no problems) but some of the other nodes cannot communicate with each other.

You talk about availability zones. Are you running in Elastic Cloud? Or have you set up your own cluster in the cloud?

One more thing you could try is to get the allocation explanations for the problematic indices - maybe that will show up something interesting:

GET /_cluster/allocation/explain?include_yes_decisions=true
{
  "index": ".ml-config",
  "shard": 0,
  "primary": true
}

GET /_cluster/allocation/explain?include_yes_decisions=true
{
  "index": ".ml-config",
  "shard": 0,
  "primary": false
}

GET /_cluster/allocation/explain?include_yes_decisions=true
{
  "index": ".ml-anomalies-shared",
  "shard": 0,
  "primary": true
}

GET /_cluster/allocation/explain?include_yes_decisions=true
{
  "index": ".ml-anomalies-shared",
  "shard": 0,
  "primary": false
}

One more problem I can see from your _cat/indices output is that you've ended up with a concrete index that should have been an alias. .ml-anomalies-.write-test_93 should have been an alias. We have recently added protection to stop this problem occurring but since you only have one document in that index it would be interesting to know what it is. Please could you do a search of .ml-anomalies-.write-test_93 and let us know what document is in it? You can redact anything confidential - just seeing the field names of the document that's in there will help us understand why it got created. Then you can delete that index and, if the test_93 job still exists, create .ml-anomalies-.write-test_93 as an alias of .ml-anomalies-shared. (If the test_93 job no longer exists then just delete that index.)

1 Like

It looks like you're a customer of Elastic Cloud and with that comes support. And, if you're experiencing inter-node communications problems, this is exactly the kind of thing support should help you with.

If you don't know how to access support, DM me the email address you use for your cloud account and we'll get someone to help you.

1 Like

Hey, thanks a lot for the help.

I tried to delete the jobs right now sing the api and it worked well :slight_smile:

Thanks a lot @droberts195
Yes it's an old job. I'll delete this index

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.