Prebuilt ML Jobs cant be activated

david-vazquez · April 21, 2020, 10:37pm

Hello all,

I´m trying to activate all prebuilt Machine Learning Jobs, but cant success it. I Try to active them from the SIEM "Anomalies Detection" menu, but mostly jobs don´t get activated.
More properly, I could activate the packetbeat prebuilt jobs, but mostly of the Windows jobs are not getting activated, and I don´t know why.
Any idea about what can be happening? When I try from Machine learning "Start Datafeed", it shows an error.

On the other hand, I´m trying to build a Malware Laboratory, and detect that Malware using ML jobs. Can I do it using the prebuilt ML jobs from Windows or should I create some?
I could activate the "rare process" ML job, but no "anomalies" were detected even when I runned Malware samples.

Thank you very much.

Regards

Frank_Hassanabad · April 21, 2020, 11:25pm

When I try from Machine learning "Start Datafeed", it shows an error.

What's the error you're seeing?

You can use pre-built jobs as is or you can also clone or modify them as starting points and create your own library of jobs. Here is a reference of them if that helps:

david-vazquez · April 22, 2020, 7:26pm

It says "no node found to start datafeed"...
It seems I can not run a certain number of concurrent jobs. Does the machine need be provided with a specific quantity of RAM and CPU?

I don´t see any anomalie, even I have a server full of Malware

Any idea?

Frank_Hassanabad · April 22, 2020, 10:11pm

It depends on if you are using on prem or cloud based setup.

For Elastic Cloud it has a simple slider to turn them on and how many you want:
https://www.elastic.co/cloud

For on prem you have to do manual setup and maintenance.
https://www.elastic.co/guide/en/machine-learning/current/setup.html#ml-nodes

You have to enable at least one ML node though in either case:

david-vazquez · April 23, 2020, 2:15pm

I have it on premise.
For sure, I´m using a "user" with all possible privileges, so this is not the problem. Also ML node is enable my default, anyway, I configured it in elasticsearch.yml

The main problem is I cannot turn ON the jobs, they are like "loading" all time, not reaching the on status. I mean, I cannot activate them all.

So cause of that, I think im not able to capture the Malware of the server.
The Elastic Stack Server has 8GB of RAM and 4 CPUs... I guess this is not the problem...
And I only have one node.

On my "laboratory server" i have installed Metricbeats,wnlogbeat , auditbeat and packetbeat. And logs are being sending properly to ELK Stack, so that prebuild jobs should work.

david-vazquez · April 23, 2020, 6:26pm

Error: "{\"error\":{\"root_cause\":[{\"type\":\"status_exception\",\"reason\":\"Could not start datafeed, allocation explanation []\"}],\"type\":\"status_exception\",\"reason\":\"Could not start datafeed, allocation explanation []\"},\"status\":429}"
    at Object.errorNotify [as error] (https://192.168.0.29:5601/bundles/49.bundle.js:3:155437)
    at https://192.168.0.29:5601/bundles/49.bundle.js:3:489206
    at Array.forEach (<anonymous>)
    at showResults (https://192.168.0.29:5601/bundles/49.bundle.js:3:489146)
    at https://192.168.0.29:5601/bundles/49.bundle.js:3:486590

This is the error I see, for example... when I try to activate job

Frank_Hassanabad · April 23, 2020, 8:34pm

I haven't seen that one before. The interesting part is the status: 429 part. That usually indicates you have too many requests occurring to your Elastic Search instance.

From: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429

Elastic Search uses it to do "rate limiting" when something is going haywire and querying too quickly.

david-vazquez · April 23, 2020, 10:00pm

So what do you suggest to do?

Frank_Hassanabad · April 24, 2020, 12:29pm

It kind of depends. You're a self managed on premise deployment, so I don't know what your hardware, network, and deployment model looks like. There's a lot of permutations that can cause something like a 429.

I would look in your Elastic Search log and Kibana log files and see what it is telling you as that would probably give you a more accurate view into what is going on in your system. If it is because Elastic is rate limiting you that could be for a lot of reasons. The most common reason is that you need to begin scaling out your Elastic instances to more instances and tune them.

It could also be because you're running a lot of rules and not enough Elastic Search instances for example. Maybe you have duplicated your rules across multiple spaces and activated too many as well. Maybe you have a lot of beats running that are consuming connections, etc... Maybe there is a custom script you deployed that is creating a lot of connections, etc...

It's really hard to say without seeing the contents of log files, knowing your hardware setup, your network setup, your Elastic Search setup, beats setup, etc...

My best advice at this point would be to take a look at your Elastic log files and Kibana log files. Be comfortable with setting them in debug mode and restarting them and then seeing what they're telling you in depth. That's your best bet for getting to a root cause. If you're running a lot of rules or a lot of beats or have custom scripts you might want to turn them all off and then see if your machine rules will run and then slowly turn them on again. You will want to check to see if you are only getting 429 only from ML rules or if you're getting it when just using Kibana, etc...

Some helpful documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/logging.html
https://www.elastic.co/guide/en/kibana/current/logs-ui-settings-kb.html

david-vazquez · April 25, 2020, 3:19pm

Thank you very much for your explanation and your time helping me. I will try to debug it

Regards

david-vazquez · April 25, 2020, 3:58pm

Anyway, I´m also trying with a Elastic Cloud instance, using free 14 days Trial.
I see the next message when I run ML prebuilt job: "no node found to open job. Persisten task is awaiting node assigment".
I enabled as you showed me in last messages the ML instance (but cant give more than 1GB RAM)...

I also trying to enable all prebuilt jobs I want to try, and they are not activated
I dont understand anything

EDIT: As I saw, it should be because of Hardware limitation. ML jobs needs Hardware to work properly.

system · May 23, 2020, 3:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.