[Bucket max] cardinality estimate required for [influencers] [host.name] but not supplied

I'm currently setting up a machine learning job to detect rare events for host names. However I get the following error on the validation page of the job:

This is the full error:

{
  "statusCode": 400,
  "error": "Bad Request",
  "message": "[illegal_argument_exception: [illegal_argument_exception] Reason: [Bucket max] cardinality estimate required for [influencers] [host.name] but not supplied]: [Bucket max] cardinality estimate required for [influencers] [host.name] but not supplied",
  "attributes": {
    "body": {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "[Bucket max] cardinality estimate required for [influencers] [host.name] but not supplied"
          }
        ],
        "type": "illegal_argument_exception",
        "reason": "[Bucket max] cardinality estimate required for [influencers] [host.name] but not supplied"
      },
      "status": 400
    }
  }
}

What causes this error and how can i mitigate it?
Thanks for the help in advance!

Hi,

This looks like an error is occurring when trying to calculate the max bucket cardinality of the field selected as an influencer (host.name).

Could you please supply the payload being sent to calculate model memory limit endpoint from the browser's network tab.
The endpoint looks like this:
...api/ml/validate/calculate_model_memory_limit

Thanks,
James

I believe this is it:

{
  "datafeedConfig": {
    "datafeed_id": "datafeed-ml-suspicious-powershell-usage",
    "job_id": "ml-suspicious-powershell-usage",
    "indices": [
      "winlogbeat-*"
    ],
    "query": {
      "bool": {
        "must": [
          {
            "match_all": {}
          }
        ]
      }
    },
    "runtime_mappings": {}
  },
  "analysisConfig": {
    "bucket_span": "15m",
    "detectors": [
      {
        "function": "rare",
        "by_field_name": "process.name"
      }
    ],
    "influencers": [
      "process.name",
      "related.user",
      "host.name"
    ]
  },
  "indexPattern": "winlogbeat-*",
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ]
    }
  },
  "timeFieldName": "@timestamp",
  "earliestMs": 1593077360180,
  "latestMs": 1651828839735
}

(Sorry for all the edits)

The only way I was able to reproduce this error was by using a non-existent field as an influencer, added by editing the JSON of the job rather than selecting it through the UI.

Can you please confirm that host.name definitely exists in the index?

James

how can i confirm this?

In DevTools, execute:

GET winlogbeat-*/_mapping/

And inspect the output for host.name

Host.name does not exist. But neither do process.name or related.user

something seems very wrong in your setup. I'm curious as to what a single document of that index looks like in your setup. What does the output of the following look like?

GET winlogbeat-*/_search
{
  "size": 1
}
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 15,
    "successful" : 15,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "winlogbeat-7.16.2-2022.04.25-000001",
        "_id" : "Ck46YoABqSgL8mZZmXzr",
        "_score" : 1.0,
        "_source" : {
          "related" : {
            "user" : "XCH08$"
          },
          "event" : {
            "type" : [
              "admin"
            ],
            "action" : "logged-in-special",
            "created" : "2022-04-22T07:28:36.140Z",
            "outcome" : "success",
            "module" : "security",
            "category" : [
              "iam"
            ],
            "provider" : "Microsoft-Windows-Security-Auditing",
            "kind" : "event",
            "code" : "4672"
          },
          "log" : {
            "level" : "information"
          },
          "winlog" : {
            "event_id" : "4672",
            "channel" : "Security",
            "provider_name" : "Microsoft-Windows-Security-Auditing",
            "api" : "wineventlog",
            "record_id" : 10780874875,
            "provider_guid" : "{54849625-5478-4994-A5BA-3E3B0328C30D}",
            "process" : {
              "pid" : 704,
              "thread" : {
                "id" : 37328
              }
            },
            "computer_name" : "mbx04.test.local",
            "logon" : {
              "id" : "0x10b0d1201"
            },
            "event_data" : {
              "SubjectUserName" : "XCH08$",
              "PrivilegeList" : [
                "SeSecurityPrivilege",
                "SeBackupPrivilege",
                "SeRestorePrivilege",
                "SeTakeOwnershipPrivilege",
                "SeDebugPrivilege",
                "SeSystemEnvironmentPrivilege",
                "SeLoadDriverPrivilege",
                "SeImpersonatePrivilege"
              ],
              "SubjectUserSid" : "S-1-5-21-2155611120-281562227-2711537737-71738",
              "SubjectLogonId" : "0x10b0d1201",
              "SubjectDomainName" : "test"
            }
          },
          "tags" : [
            "forwarded",
            "logstash-beats",
            "logstash-elklog01",
            "beats_input_raw_event"
          ],
          "@version" : "1",
          "message" : "Special privileges assigned to new logon",
          "user" : {
            "name" : "xch08$",
            "domain" : "test",
            "id" : "S-1-5-21-2155611120-281562227-2711537737-71738"
          },
          "agent" : {
            "type" : "winlogbeat",
            "version" : "7.16.2",
            "id" : "8e535403-682e-4836-b9ec-fef7111dc479",
            "name" : "wec02",
            "ephemeral_id" : "31de5a80-2175-47db-ab00-5c5ea59dca93",
            "hostname" : "wec02"
          },
          "@timestamp" : "2022-04-22T07:28:34.276Z",
          "ecs" : {
            "version" : "1.12.0"
          },
          "host" : {
            "name" : "mbx04.test.local"
          }
        }
      }
    ]
  }
}

The section above IS the host.name. You may not realize it because it is expressed as a JSON object.

Same holds true for related.user:

So it does exist. Then im still not clear on why it does not work.

Because you are asking the ML job to also use process.name, but that field does not exist in your data. There is a winlog.process.pid field, but no process.name field.

Therefore, to @James_Gowdy's original statement, the problem occurs when a field is referenced but doesn't exist in the data.