Machine Learning error when creating and openning new Job

Hello,

I've installed a metricbeat which is garthering information about a linux system and I want to try the machine learning forecast feautre using the index generated by metricbeat.

I've created a Job using the metricbeat index and a field (for example disk usage) but when I create the job and it tries to be openned, I'm getting the following error:

Unexpected job state [failed] while waiting for job to be opened

 Less Info
OK
Error: "{\"error\":{\"root_cause\":[{\"type\":\"exception\",\"reason\":\"Unexpected job state [failed] while waiting for job to be opened\"}],\"type\":\"exception\",\"reason\":\"Unexpected job state [failed] while waiting for job to be opened\"},\"status\":500}"
    at Object.errorNotify [as error] (https://[ELK_HOST]/bundles/ml.bundle.js:2:61134)
    at https://[ELK_HOST]/bundles/ml.bundle.js:2:107552
    at Array.forEach (<anonymous>)
    at showResults (https://[ELK_HOST]/bundles/ml.bundle.js:2:107485)
    at https://[ELK_HOST]/bundles/ml.bundle.js:2:104924

If I try to run datafeed it fails with the same problem because the job can't be openned.

The version of ELK Stack is 7.0 and ML is activated.

Thanks in advance.

since you are on 7.0 there's an easy way to find out what the problem is, as a benefit from this recent addition: https://github.com/elastic/elasticsearch/pull/38029

call _cluster/state?pretty in a web browser (against any node), then look for the "persistent_tasks" section, and find the task whose name includes the job name then it should include the detailed failure reason.

Thanks for your response richcollier. I've tried what you said and I got the following:

image

Unfortunately, tasks section is empty so I can't find any extra information. Do you know something more I can try? Meanwhile I'll continue investigating.

Thanks!

Seems like the best thing to do is to try to do it again, but this time - try to find the appropriate ML-related error message that is written to elasticsearch.log on the node that attempted to open the ML job. If you have more than one node, you'll have to go to each node to find which one tried.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.