Resource Utilization Machine Learning

richcollier · April 29, 2022, 11:07am

"partitioned on" means split the analysis "for every". There are two ways to split - with a partition field and with a by field (see below).

Ok cool - so if you're running the rare process per host ML job (that is a built-in detection rule) then the job configuration shows the following":

    "detectors": [
      {
        "detector_description": "rare process executions on Windows",
        "function": "rare",
        "by_field_name": "process.name",
        "partition_field_name": "host.name"
      }
    ],

This basically means that "find a rare process...consider every process for every host". The by_field is process.name and the partition_field is host.name. From your screenshot, it looks as if you have about 7300 hosts and over 1.6 million (total) process names being tracked by this ML job. So, this is a "double split" because both the by_field and partition_field are being used (they have subtle differences in "how" they cause a split and if you crafted an ML job by hand you may try to figure out the similarities and differences between the two).

But in your case, you've simply enabled a built-in job so you didn't really have much say as to how the job is constructed, you are just seeing the result of deploying the job to cover 7300+ entities (hosts in this case), each host having many processes.

So, you have a couple of options here. If you feel like this job is valuable and if it is working fine on a particular ML node...then let it go. A model memory size of 1.6GB is not that obscene. You could, in theory, create several cloned versions of this job, and have each one operate on a filtered list of hosts (i.e. jobA is for hosts in the LA Data Center where the hostname begins with "LAXPROD..." and jobB is for the St. Louis Data Center where the hostname begins with "STLPROD...", and so on). In general, having more smaller jobs is better than having fewer, giant jobs merely because the jobs and be more easily distributed to run on more nodes. But you probably don't need to go to such lengths here.

If you want to learn more about Elastic ML may I suggest reading my book. Hard copy on Amazon or get a free e-copy here.

Topic		Replies	Views
No ML nodes with sufficient capacity Kibana elastic-stack-machine-learning	3	731	May 24, 2022
Max heap size in nodes elasticsearch Elasticsearch elastic-stack-machine-learning	2	474	August 9, 2021
Can we give more than 32GB Memory to dedicated Machine learning Node? Elasticsearch elastic-stack-machine-learning	7	2503	March 23, 2023
Memory status when processing Elasticsearch	3	896	October 30, 2017
ML node memory configuration Elastic Cloud Enterprise (ECE) elastic-stack-machine-learning	4	1643	February 26, 2020

Resource Utilization Machine Learning

Related topics