ML Job failed: autodetect process stopped unexpectedly: Fatal error

machine-learning

(Robert) #1

Hi I am trying to use ML but it is failing with auto-detect process stopped unexpectedly

The job is the nginx job from the filebeat-* index that is preinstalled.

The elastic log:

[2018-06-28T21:59:12,698][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [node-1] Opening job [remote_ip_url_count]
[2018-06-28T21:59:12,766][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [node-1] [remote_ip_url_count] Loading model snapshot [N/A], job latest_record_timestamp [N/A]
[2018-06-28T21:59:13,123][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [remote_ip_url_count] [autodetect/12793] [CResourceMonitor.cc@67] Setting model memory limit to 1024 MB
[2018-06-28T21:59:13,965][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [node-1] Successfully set job state to [opened] for job [remote_ip_url_count]
[2018-06-28T21:59:16,161][INFO ][o.e.x.m.d.DatafeedJob    ] [remote_ip_url_count] Datafeed started (from: 2018-04-19T07:35:55.000Z to: 2018-06-28T23:58:51.001Z) with frequency [600000ms]
[2018-06-28T21:59:17,326][WARN ][o.e.x.m.j.p.a.o.AutoDetectResultProcessor] [remote_ip_url_count] some results not processed due to the termination of autodetect
[2018-06-28T21:59:17,327][ERROR][o.e.x.m.j.p.a.NativeAutodetectProcess] [remote_ip_url_count] autodetect process stopped unexpectedly: Fatal error: 'si_signo 4, si_code: 2, si_errno: 0, address: 0x7f69daa8e227, library: /usr/share/elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so, base: 0x7f69da930000, normalized address: 0x15e227'

[2018-06-28T21:59:17,324][INFO ][o.e.x.m.j.p.a.NativeAutodetectProcess] [remote_ip_url_count] State output finished
[2018-06-28T21:59:17,326][ERROR][o.e.x.m.j.p.a.AutodetectCommunicator] [remote_ip_url_count] Unexpected exception writing to process
java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:?]
        at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) ~[?:?]
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[?:?]
        at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[?:?]
        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) ~[?:?]
        at java.nio.channels.Channels.writeFullyImpl(Channels.java:78) ~[?:1.8.0_171]
        at java.nio.channels.Channels.writeFully(Channels.java:101) ~[?:1.8.0_171]
        at java.nio.channels.Channels.access$000(Channels.java:61) ~[?:1.8.0_171]
        at java.nio.channels.Channels$1.write(Channels.java:174) ~[?:1.8.0_171]
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[?:1.8.0_171]
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) ~[?:1.8.0_171]
        at java.io.FilterOutputStream.write(FilterOutputStream.java:97) ~[?:1.8.0_171]
        at org.elasticsearch.xpack.ml.job.process.autodetect.writer.LengthEncodedWriter.writeField(LengthEncodedWriter.java:91) ~[?:?]
        at org.elasticsearch.xpack.ml.job.process.autodetect.writer.LengthEncodedWriter.writeRecord(LengthEncodedWriter.java:51) ~[?:?]
        at org.elasticsearch.xpack.ml.job.process.autodetect.NativeAutodetectProcess.writeRecord(NativeAutodetectProcess.java:144) ~[?:?]
        at org.elasticsearch.xpack.ml.job.process.autodetect.writer.AbstractDataToProcessWriter.transformTimeAndWrite(AbstractDataToProcessWriter.java:196) ~[?:?]
        at org.elasticsearch.xpack.ml.job.process.autodetect.writer.JsonDataToProcessWriter.writeJson(JsonDataToProcessWriter.java:160) ~[?:?]
        at org.elasticsearch.xpack.ml.job.process.autodetect.writer.JsonDataToProcessWriter.writeJsonXContent(JsonDataToProcessWriter.java:84) ~[?:?]
        at org.elasticsearch.xpack.ml.job.process.autodetect.writer.JsonDataToProcessWriter.write(JsonDataToProcessWriter.java:65) ~[?:?]
        at org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectCommunicator.lambda$writeToJob$1(AutodetectCommunicator.java:123) ~[?:?]
        at org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectCommunicator$1.doRun(AutodetectCommunicator.java:363) ~[?:?]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) ~[elasticsearch-6.3.0.jar:6.3.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.0.jar:6.3.0]
        at org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectProcessManager$AutodetectWorkerExecutorService.start(AutodetectProcessManager.java:678) ~[?:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_171]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:625) [elasticsearch-6.3.0.jar:6.3.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]

I installed 6.3 from the apt repositories on ubuntu 16.04
It looks to be that something with the lib/libMlMaths.so


(David Kyle) #2

Hi Robert,

Fatal error: 'si_signo 4, si_code: 2, si_errno: 0, ..

The signal is SIGILL which means the CPU didn't understand an instruction and is one of the signals that cannot be handled. Please review the linked comment for details.


(Mark Walkom) #3