I am using the machine learning feature of X-Pack. I created some jobs, that seem to work fine. The next day, I tried to create and open a job and got the following error. I also get the error if I try and run one of my existing jobs from before.
Could not open job: [status_exception] Could not open job because no suitable nodes were found, allocation explanation [Not opening job [test3_by_flow] on node [{spotlight}{6uOyTVYtRhSGnyTgyInnog}{z5Z3ewJPQgKeDLAlGgQATg}{localhost}{127.0.0.1:9300}{ml.enabled=true}], because node exceeds [2] the maximum number of jobs [2] in opening state]
A machine learning job can potentially consume a lot of resources when it is opening as the model state, which can be large and is read from an Elasticsearch index, is being restored. For this reason we limit the number of concurrently opening jobs to 2. Typically a job should be in the opening state for only a short period but in your case 2 jobs seem to have got stuck in that state.
Additionally there is a limit on the number of jobs that can be open on a node, by default this is 10. If you already have 10 open jobs then you cannot open anymore. Can you paste the output of GET _xpack/ml/anomaly_detectors/ and GET _xpack/ml/anomaly_detectors/_stats in this ticket please so I can better understand the problem. Also which version of X-Pack are you using and on what operating system? How many nodes do you have in your cluster and how many of those have ML enabled?
The solution may be as simple as closing some open jobs or the jobs in the opening state but I'd like to understand what is happening, it would be very useful if you can reply with the information I asked for above thanks.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.