Error while delete ml anomaly job

liorg2 · August 5, 2020, 3:02pm

I'm trying to delete an ML job.

I used Kibana to do so, and Kibana shows "deleting" status for this job.

DELETE _ml/anomaly_detectors/jobid?force=true

gives:

"reason" : {
"type" : "node_disconnected_exception",
"reason" : "[instance-0000000057][10.x.x.x:19998][indices:data/read/search[phase/query]] disconnected"
}

(node is up and running)

POST _ml/anomaly_detectors/jobid/_close?force=true

gives:

No known job with id

Any assistance will be appreciated

droberts195 · August 5, 2020, 3:41pm

What does delete without force return?

DELETE _ml/anomaly_detectors/jobid

How about just getting the job?

GET _ml/anomaly_detectors/jobid

Which version of Elasticsearch are you running?

liorg2 · August 5, 2020, 3:57pm

Hi

Delete without force returns the same

GET...
returns

{
  "error" : {
    "root_cause" : [ ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [ ]
  },
  "status" : 503
}

droberts195 · August 5, 2020, 4:15pm

It sounds like you might have lost your .ml-config index, or the node that index is on is inaccessible.

Which version of Elasticsearch are you using?

Does this show anything wrong?

GET _cat/indices/.ml*?v

liorg2 · August 5, 2020, 4:35pm

version 7.8.1

richcollier · August 5, 2020, 4:42pm

@liorg2 - Do you have an active support contract with us? I'm thinking it would be more helpful if you could send in a support diagnostic bundle instead of this back-and-forth

liorg2 · August 5, 2020, 7:19pm

unfortunately not...
tried a restart and still the same

"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : ".ml-anomalies-shared",
"node" : "x4BPgr-OTuSr6CycZY2SuQ",
"reason" : {
"type" : "node_disconnected_exception",
"reason" : "[instance-..........][10........:19662][indices:data/read/search[phase/query]] disconnected"
}
}
],

liorg2 · August 5, 2020, 7:22pm

now new error

"reason" : "No shard available for [get [.ml-config][_doc][anomaly_detector-..........-job]: routing [null]]"

richcollier · August 5, 2020, 8:17pm

I assume this is your answer to my question about having a support contract with us?

liorg2 · August 5, 2020, 8:18pm

Yes.
Btw what i did today is changing the ml node from 1 az into 2.

droberts195 · August 6, 2020, 9:25am

I wonder if there is a problem with communications between nodes where the master node can communicate with all the other nodes in the cluster (hence _cat/indices shows no problems) but some of the other nodes cannot communicate with each other.

You talk about availability zones. Are you running in Elastic Cloud? Or have you set up your own cluster in the cloud?

One more thing you could try is to get the allocation explanations for the problematic indices - maybe that will show up something interesting:

GET /_cluster/allocation/explain?include_yes_decisions=true
{
  "index": ".ml-config",
  "shard": 0,
  "primary": true
}

GET /_cluster/allocation/explain?include_yes_decisions=true
{
  "index": ".ml-config",
  "shard": 0,
  "primary": false
}

GET /_cluster/allocation/explain?include_yes_decisions=true
{
  "index": ".ml-anomalies-shared",
  "shard": 0,
  "primary": true
}

GET /_cluster/allocation/explain?include_yes_decisions=true
{
  "index": ".ml-anomalies-shared",
  "shard": 0,
  "primary": false
}

One more problem I can see from your _cat/indices output is that you've ended up with a concrete index that should have been an alias. .ml-anomalies-.write-test_93 should have been an alias. We have recently added protection to stop this problem occurring but since you only have one document in that index it would be interesting to know what it is. Please could you do a search of .ml-anomalies-.write-test_93 and let us know what document is in it? You can redact anything confidential - just seeing the field names of the document that's in there will help us understand why it got created. Then you can delete that index and, if the test_93 job still exists, create .ml-anomalies-.write-test_93 as an alias of .ml-anomalies-shared. (If the test_93 job no longer exists then just delete that index.)

richcollier · August 6, 2020, 2:32pm

It looks like you're a customer of Elastic Cloud and with that comes support. And, if you're experiencing inter-node communications problems, this is exactly the kind of thing support should help you with.

If you don't know how to access support, DM me the email address you use for your cloud account and we'll get someone to help you.

liorg2 · August 6, 2020, 4:27pm

Hey, thanks a lot for the help.

I tried to delete the jobs right now sing the api and it worked well

liorg2 · August 6, 2020, 4:29pm

Thanks a lot @droberts195
Yes it's an old job. I'll delete this index

system · September 3, 2020, 4:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to delete a failed job in Machine Learning Beta Elasticsearch elastic-stack-machine-learning	5	1541	September 27, 2017
Ml jobs stuck in deleting state Elasticsearch elastic-stack-machine-learning	2	628	August 3, 2020
Kibana X-pack Machine Learning Jobs list could not be retrieved Kibana elastic-stack-machine-learning	11	1183	February 27, 2018
Error comes while close jobs in machine learning Elasticsearch elastic-stack-machine-learning	2	488	January 30, 2019
ML Throws errors as soon as click on it Elasticsearch elastic-stack-machine-learning	18	2523	June 8, 2017

Error while delete ml anomaly job

Related topics