Unable to open Machine Learning job

Alon_Goldstein · July 31, 2018, 12:43pm

Hi,

I'm using Elasticsearch and Kibana version 6.2.4 with Platinum license.
Every time I create a Machine Learning job (through API or Kibana), I'm getting the following error in ES log:

[2018-07-31T08:36:30,172][WARN ][r.suppressed ] path: /_xpack/ml/anomaly_detectors/aa/_open, params: {job_id=aa}
org.elasticsearch.transport.RemoteTransportException: [zlt23646.vci.att.com][135.68.47.160:9300][cluster:admin/xpack/ml/job/open]
Caused by: org.elasticsearch.ElasticsearchException: Unexpected job state [failed] while waiting for job to be opened
at org.elasticsearch.xpack.core.ml.utils.ExceptionsHelper.serverError(ExceptionsHelper.java:43) ~[?:?]
at org.elasticsearch.xpack.ml.action.TransportOpenJobAction$JobPredicate.test(TransportOpenJobAction.java:351) ~[?:?]
at org.elasticsearch.xpack.ml.action.TransportOpenJobAction$JobPredicate.test(TransportOpenJobAction.java:326) ~[?:?]
at org.elasticsearch.xpack.core.persistent.PersistentTasksService.lambda$waitForPersistentTaskStatus$4(PersistentTasksService.java:157) ~[?:?]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.clusterChanged(ClusterStateObserver.java:186) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateListeners$7(ClusterApplierService.java:509) ~[elasticsearch-6.2.4.jar:6.2.4]
at java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3527) ~[?:1.8.0_91]
at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743) ~[?:1.8.0_91]
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) ~[?:1.8.0_91]
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateListeners(ClusterApplierService.java:506) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:489) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:161) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) ~[elasticsearch-6.2.4.jar:6.2.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_91]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_91]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]

What am I doing wrong?

Thanks,
Alon

dkyle · July 31, 2018, 2:02pm

Hi Alon,

Can you tell me a little more about your set up please. Which OS are you using, have you upgraded Elasticsearch recently? Does this error occur for all jobs you create or just certain ones and can you share the job configuration ?

You have a Platinum license have you raised the issue with support?

droberts195 · August 1, 2018, 6:40am

The trick to solving this will be to look in the log file of the node where the ML job attempted to start. That will contain the underlying error message. How many ML nodes are in your cluster? By that I mean nodes with node.ml: true in elasticsearch.yml or no mention of node.ml (since true is the default). If it’s just a few maybe you could check their logs around the time of the error.

Alon_Goldstein · August 1, 2018, 7:39am

I am using a two node cluster, running on Linux machines.
No upgrade with ML jobs was done.
This error occurs for every job I'm trying to create, no matter what the data or job configuration is.

I should probably mention that I changed the configuration for the elastic TMP folder, since previously the ML job error indicated it couldn't find the folder in the /tmp folder on the machine.

Alon_Goldstein · August 1, 2018, 7:41am

What I shared in the original message is the exact error I'm getting in the log files.
No other underlying error messages unfortunately...

there is no mention of node.ml in elasticsearch .yml
Also, when restarting the node, the log file states that node.ml=true

droberts195 · August 1, 2018, 8:07am

So it’s a single node cluster?

droberts195 · August 1, 2018, 8:16am

Sorry I only saw your second reply.

The temp problem is this: https://github.com/elastic/elasticsearch/issues/31732

It can cause the original error you posted. Did you change the temp directory on both nodes? If you only changed it on one node then try changing it on the second one as well and see if that solves the ML job startup problem.

The original error you posted was a remote transport exception, which means the underlying problem was on the other node. So I’m pretty sure there will be an exception in the log on that other node. But probably the exception is the missing temp directory, and you’re not connecting it with the failure to open the job. If that’s the case then explicitly setting the temp directory on both nodes will solve it.

Alon_Goldstein · August 1, 2018, 11:51am

Works!!!

Thanks a lot!

Topic		Replies	Views
Can not create Machine learning job in Kibana Kibana elastic-stack-machine-learning	2	558	October 5, 2020
Unable to create ML job Elasticsearch elastic-stack-machine-learning	6	3153	October 12, 2018
Machine Learning error when creating and openning new Job Elasticsearch elastic-stack-machine-learning	4	1145	July 4, 2019
Error Machine Learning Job in Elastic Cloud Enterprise "autodetect" Elasticsearch elastic-stack-machine-learning , docker	13	1204	February 10, 2021
ML job error : Failed to launch autodetect for job Kibana elastic-stack-machine-learning	7	1289	February 3, 2021

Unable to open Machine Learning job

Related topics