I hope you are safe and well
I'm running a POC in Elastic Cloud, and I have an issue with ML jobs
I have some jobs stuck as you see below. That's under SIEM > Detections > ML Job Settings
They have been in this status for few days
I tried to find more details about the error and found below messages
No node found to open job. Reasons [persistent task is awaiting node assignment.]
No node found to open job. Reasons [Not opening job [rare_process_by_host_windows_ecs] because job memory requirements are stale - refresh requested]
Is that related to resource utilization? The ML node seems to be healthy
Your screenshot shows the SIEM job list. If you were to jump to the machine learning UI tab, then I suspect those jobs would be in a "opening" state. Based on the data that it is trying the analyse, ML has estimated that the memory required is not quite sufficient to open all the ML jobs on a single 1GB node.
Assuming you have multiple jobs in this "opening" state then I would first suggest that if you take a look in the Machine Learning job list. Make sure that jobs are closed for which you are not yet ingesting data (e.g. if no auditbeat data, then you won't need the auditbeat jobs).
By closing these, it should free up memory for other jobs to start.
If jobs still remain "opening", then you can free up space by closing jobs which have a lower relevance for you. You could also choose to run a subset of jobs in real-time and some in batch against a specified date range.
ML jobs model data in real time and holds this model in memory. The model size is determined by the characteristics of the input data. In general, the higher the cardinality the larger the memory required. As such, depending on your data, it may only be possible to concurrently run a subset of the SIEM jobs in real-time on the SIEM cluster.
If you are seeing other errors in the Machine Learning app, please let us know.]
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.