Not able to start machine learning job

We are configuring a ML job in our cluster.
After creating the ML job, tried to start the ML jobs and am getting below error,

Caused by: java.nio.file.NoSuchFileException: /tmp/elasticsearch-6776145032467540758/limitconfig1253236941500878778.conf
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[?:?]
at java.nio.file.Files.newByteChannel(Files.java:361) ~[?:1.8.0_171]
at java.nio.file.Files.createFile(Files.java:632) ~[?:1.8.0_171]
at java.nio.file.TempFileHelper.create(TempFileHelper.java:138) ~[?:1.8.0_171]
at java.nio.file.TempFileHelper.createTempFile(TempFileHelper.java:161) ~[?:1.8.0_171]
at java.nio.file.Files.createTempFile(Files.java:852) ~[?:1.8.0_171]
at org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectBuilder.buildLimits(AutodetectBuilder.java:274) ~[?:?]
at org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectBuilder.build(AutodetectBuilder.java:182) ~[?:?]
at org.elasticsearch.xpack.ml.job.process.autodetect.NativeAutodetectProcessFactory.createNativeProcess(NativeAutodetectProcessFactory.java:109) ~[?:?]

Looks like it’s a known error, the solution suggested is to update the -Djava.io.tempdir to a constant value, after the change restarted master nodes to take effect.
In jvm process tempdir is updated as /tmp/elasticsearch-4034599267312088687.
However while starting the ML job its complaining a different directory elasticsearch-6776145032467540758/limitconfig1253236941500878778.conf is not available.

Can someone help me in cracking this issue?

1 Like

After updating ES_TMPDIR=/tmp/elasticsearch-4034599267312088687 i managed to get rid of the directory.
But everytime its looking for random limitconfig file For example first time when i started the job its looking in the directory /tmp/elasticsearch-4034599267312088687 for the file limitconfig4206780411356268755.conf.
Second time when i started the ML job its looking for the file limitconfig7936535717952102862.conf inside the directory /tmp/elasticsearch-4034599267312088687.

Can someone please help on this.

Hi Sathishkumar,

yes, I will pick this query up and ensure you receive a response to it shortly.

In the meantime could you please obtain, package up and provide the log files from every ML node in the cluster. Sending to me using a direct message on here is secure and a good way of doing this.

Best wishes,

Ed