Troubleshooting with machine learning

machine-learning

(ChaeYoung) #1

Hello,
I'm a newbie who is learning machine learning of x-pack.
I have a trouble with machine learning as I wrote![1|690x385]
My goal is creating a single metric job. (Machine learning-> create a new job -> create a single metric job)
My situation

  1. I added ".monitoring-es-*" with Time filter name "timestamp" on elasticsearch.
    (Management tab-> Index pattern)
  2. I made a single metric job and I got a graph for my ".monitoring-es-*", I chose "mean" for aggregation and "node_stats.os.cpu.load_average.15m" for Field and "15m" for Bucket span.
  3. However, My job didn't analyze the data which is different from the video of elastic's
    (the url for the video: https://www.youtube.com/watch?v=DBRISS0UKcA)
    Fore more detail, it does not analyze but nothing happend just made a job file and when I clicked "Machine learning " again and click "open count in Anomaly Explorer" It just show me "No results Found."

I googled hundreds of time but I can't find any clue about this issue.
Plz help me.


(rich collier) #2

Hello,

It's hard to tell exactly the sequence of things given your screenshots, but clearly you properly created a job (from what I can tell), however, the screenshot of the jobs page shows that the job has processed "0" records (i.e. it didn't analyze any of your data).

So, when building the job in the Single Metric Wizard, make sure you select a good window of time for historical data. So either:

  1. set the kibana time picker to something sensible (like last 7 days) and then click the button with the triangle on it:
    image

  2. Or, click the button that says "Use full .monitoring-es-* data on it:
    image

Once you've done this, give your job a unique name (my screen looks like this):

Then click the "Create Job" button and you should see the data be analyzed with a little animation. Once finished, Click the "View Results" button:

And then you'll see the results in the Single Metric Viewer:

I hope this helps!


(ChaeYoung) #3

First of all, I really appreciate for your precious answer.!
I didn't consider about the kibana time picker you mentioned.
It seems work due to the graph is changing, when I change time peaker.
However I switch the time peaker (last 24 hours or last 4hours and so on) , the analyzing process is still not working...
and at the same time when i check the "job management", the processed records is still zero.
I would be really happy if you can give me a hand more about it.
If you need any imformation plz tell me.
Have a nice day.

+ps) I followed exact way you show me and I got "stopped" for Datafeed state on Job management.
This is picture for it.


(rich collier) #4

Thanks, this is helpful information - because now it seems as if the problem you are having is likely not due to any configuration problem. So, what we need now is more detailed information as to what is happening behind the scenes. This information will be in the elasticsearch.log file. Please do the following:

  1. Configure a job as you've done above but before clicking the "Create Job" button, keep track of where the end of the elasticsearch.log is.
  2. Click the "Create Job" button in the UI and note the new messages that appear in the elasticsearch.log file.

Please copy/paste these messages from the log here so that we can diagnose.


(rich collier) #5

Additionally, I would like to know the specs of the system you are running ML on (size of RAM, # of CPUs, etc.). Is it a single node setup for testing or is this a multi-node production system?


(ChaeYoung) #6

Always appreciate for your help.

  1. spec of the system.
    I am running ML on CentOS7 installed on Server.
    the server spec is
  • system bit: 64bit
  • OS: CentOS7
  • cpu: xeon(R) CPU E5-2609 v2 @2.5GHz
    number of cpu: 8
  • memory: 65869812kB
  1. elasticsearch.log
    -ps1. "νŒŒμ΄ν”„κ°€ 깨어짐" means "broken pipe error" (I think setting language into Korean makes me get this message.)
    -ps2. I paste the whole passage incase I could miss something.

(If you have a difficulty with seeing the code, please tell me your e-mail address. so I can send you the txt. fle)

have a nice day


(David Roberts) #10

The Machine Learning feature is implemented using native processes that run outside of the JVM that runs Elasticsearch. In your case the native process called autodetect cannot be started. Usually this process would be started by another native process called controller, which is usually started when Elasticsearch is started. I can reproduce your sequence of error messages if I kill this controller process and then try to open a job.

Please can you check what happens if you try and run the relevant native processes at the command prompt:

$ES_HOME/plugins/x-pack/platform/linux-x86_64/bin/controller --version
$ES_HOME/plugins/x-pack/platform/linux-x86_64/bin/autodetect --version

Do either of these commands complain about missing OS libraries or other OS-related problems?

If not, the most likely scenario is that your controller process has been killed. Do you see it in the process list if you run ps -e | grep controller?

How long has this node been running for? It would be helpful if you could check all the logs since the node started for messages containing either controller or CppLogMessageHandler. Something like:

egrep 'controller|CppLogMessageHandler' $ES_HOME/logs/*log

Also, I will raise an issue for friendlier error reporting in the case where the controller process is not running for some reason. The way it's reported at the moment is impossible to understand.


(ChaeYoung) #11
  1. I put

$ES_HOME/plugins/x-pack/platform/linux-x86_64/bin/controller --version
$ES_HOME/plugins/x-pack/platform/linux-x86_64/bin/autodetect --version

in the terminal and I got following message for each of them.

controller (64 bit): Version 5.5.0 (Build 9352b273163d45) Copyright (c) 2017 Elasticserach BV
autodetect (64 bit): Version 5.5.0 (Build 9352b273163d45) Copyright (c) 2017 Elasticserach BV

So as you say, I think it's ok with OS things.

  1. "ps -ef |grep controller"
    I put that command to terminal and I got

[2017-8-02|690x196]
root 139969 55537 0 10:33 -pts/6 00:00:00 grep controller

Isn't this message tell me that controller is on the process???

ps) I made a new machine job again to check it would work, but same thing happened (No analysis)
ps2) I tried to restart controller. but It didn't work. I used "kill -9 " command but it didn't work at all.
kill controller
ps3) I tried to start controller and autodetect. However, controller didn't response at all and autodetect had a license problem. this is what i got on the terminal.

  1. I checked log files and the log file which contains "controller" start from 2017-07-26.

and I found a strange thing.

It says

[2017-07-26T17:48:46,834][INFO][o.e.x.m.j.p.NativeController] Native controller process has stopped - no new native processes can be started


(ChaeYoung) #12

I finally i solved it.
The main cause was that the elasticsearch was not actually runnung. ( I have to restart elasticserach)
(when I restarted elasticserach, It did staop but not started)
So I found out that I had written

xpack.ml.enabled: true
ml.enabled: true

in elasticsearch.yml and I forgot about that.
so, I delete "ml.enabled: true" and elasticsearch was now running.

as you told me about, the controller is running when elasticsearch starts.
Without your help (especially about controller and autodetect things) I can't solve it.
Again, I really appreciate for your help.!
Have a nice day.


(system) #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.