ML multi metric split filed only have keywork

I'm just a beginner of ML so I didn't know why this happened in my ELK.
I want to create new job in multi metric so I following those step
Pick "Fail login" search index
image
Then pick multi metric -> time range "use full filebeat* data"

Pick file "high-count(event rate)"
But when I want to pick split field and influencersit they only have field.keyword
image

So when I create metric it has error

How can I fix this?
I know this happened because I used field.keywork, but I didn't have other choice.
What's happen with my log?

@quyennguyen, there is an advanced option to give a job its own index.

By default jobs share an index to reduce the overall cluster shard count. But, for situations like this (or for very large jobs), selecting "Use dedicated index" is a good idea.

It worked somehow, but seem like I didn't get the result

Do you know why I don't have fields but only have fields.keywork?

@quyennguyen , I don't understand your question. Are you wondering why there are no influencer results?

If thats the case, could you provide the job and datafeed configurations (with sensitive information removed, if any exists in the configuration).

I'm wondering why in split field on have fields.keywork but not fields. Because when I did it the first time, it has system.auth.hostname but not system.auth.hostname.keywork, and I get the result I want. But I deleted it for some reason.

About job and datafeed config, I don't know where to get it, 'cause I really empty in this side. Can you show me how to get it send to you?

It has keyword because you specified those values. keyword signifies a keyword field type. It means your text field is indexed as a keyword as well as a text type.

I am guessing it worked before because your data has somehow changed. Possibly not all the indices have they keyword field, or that field was added after data was already indexed.

To get the configs, you should be able to get them from the machine learning job management UI: Anomaly detection | Kibana Guide [7.14] | Elastic

Select the JSON tab.

OK. After created job with keyword field, I got this

{
  "job_id": "login_ssh_multi_metric",
  "job_type": "anomaly_detector",
  "job_version": "7.13.3",
  "create_time": 1628622270365,
  "finished_time": 1628622303319,
  "model_snapshot_id": "1628622302",
  "description": "",
  "analysis_config": {
    "bucket_span": "1m",
    "detectors": [
      {
        "detector_description": "high_count partitionfield=\"system.auth.hostname.keyword\"",
        "function": "high_count",
        "partition_field_name": "system.auth.hostname.keyword",
        "detector_index": 0
      }
    ],
    "influencers": [
      "system.auth.hostname.keyword",
      "system.auth.ssh.ip.keyword"
    ]
  },
  "analysis_limits": {
    "model_memory_limit": "11mb",
    "categorization_examples_limit": 4
  },
  "data_description": {
    "time_field": "@timestamp",
    "time_format": "epoch_ms"
  },
  "model_plot_config": {
    "enabled": false,
    "annotations_enabled": false
  },
  "model_snapshot_retention_days": 10,
  "daily_model_snapshot_retention_after_days": 1,
  "results_index_name": "custom-login_ssh_multi_metric",
  "allow_lazy_open": false,
  "data_counts": {
    "job_id": "login_ssh_multi_metric",
    "processed_record_count": 1685,
    "processed_field_count": 958,
    "input_bytes": 90377,
    "input_field_count": 958,
    "invalid_date_count": 0,
    "missing_field_count": 2412,
    "out_of_order_timestamp_count": 0,
    "empty_bucket_count": 32938,
    "sparse_bucket_count": 2,
    "bucket_count": 32968,
    "earliest_record_timestamp": 1626604733000,
    "latest_record_timestamp": 1628582785000,
    "last_data_time": 1628622272689,
    "latest_empty_bucket_timestamp": 1628582640000,
    "latest_sparse_bucket_timestamp": 1628529060000,
    "input_record_count": 1685,
    "log_time": 1628622272689,
    "latest_bucket_timestamp": 1628582760000
  },
  "model_size_stats": {
    "job_id": "login_ssh_multi_metric",
    "result_type": "model_size_stats",
    "model_bytes": 82740,
    "peak_model_bytes": 104736,
    "model_bytes_exceeded": 0,
    "model_bytes_memory_limit": 11534336,
    "total_by_field_count": 4,
    "total_over_field_count": 0,
    "total_partition_field_count": 3,
    "bucket_allocation_failures_count": 0,
    "memory_status": "ok",
    "assignment_memory_basis": "current_model_bytes",
    "categorized_doc_count": 0,
    "total_category_count": 0,
    "frequent_category_count": 0,
    "rare_category_count": 0,
    "dead_category_count": 0,
    "failed_category_count": 0,
    "categorization_status": "ok",
    "log_time": 1628622302104,
    "timestamp": 1628582700000
  },
  "forecasts_stats": {
    "total": 0,
    "forecasted_jobs": 0
  },
  "state": "closed",
  "timing_stats": {
    "job_id": "login_ssh_multi_metric",
    "bucket_count": 32968,
    "total_bucket_processing_time_ms": 18977.00000000005,
    "minimum_bucket_processing_time_ms": 0,
    "maximum_bucket_processing_time_ms": 532,
    "average_bucket_processing_time_ms": 0.5756187818490673,
    "exponential_average_bucket_processing_time_ms": 1.1737351400660319,
    "exponential_average_bucket_processing_time_per_hour_ms": 60.74121520456408
  },
  "datafeed_config": {
    "datafeed_id": "datafeed-login_ssh_multi_metric",
    "job_id": "login_ssh_multi_metric",
    "query_delay": "90505ms",
    "chunking_config": {
      "mode": "auto"
    },
    "indices_options": {
      "expand_wildcards": [
        "open"
      ],
      "ignore_unavailable": false,
      "allow_no_indices": true,
      "ignore_throttled": true
    },
    "query": {
      "bool": {
        "must": [
          {
            "match_all": {}
          }
        ],
        "filter": [
          {
            "match_phrase": {
              "system.auth.ssh.event": "Failed password"
            }
          }
        ],
        "must_not": []
      }
    },
    "indices": [
      "filebeat*"
    ],
    "scroll_size": 1000,
    "delayed_data_check_config": {
      "enabled": true
    },
    "state": "stopped",
    "timing_stats": {
      "job_id": "login_ssh_multi_metric",
      "search_count": 4,
      "bucket_count": 32968,
      "total_search_time_ms": 29,
      "average_search_time_per_bucket_ms": 0.0008796408638679932,
      "exponential_average_search_time_per_hour_ms": 20.478633887048332
    }
  }
}

The result didn't have value and system. auth.ssh.ip.

The document values for system.auth.hostname.keyword and system.auth.ssh.ip.keyword where anomalies are occurring seem to be ""

I have just confirmed with my own data that two jobs, one looking at the text field and one looking at the keyword field resulted in the same anomalies and influencers. This is only the case if all your data has the keyword field.

I still can't get it, my English maybe not good enough to get your idea.

Do you mean this error happen because all my log has the keyword field?

What can I do to fix this?

I just delete log in a few days and resolve the problem, but then it got some error with datafeed, do you know why this happened?

That problem occurs when the datafeed cannot successfully execute a search against the index where the raw data to be analyzed resides. You might have a problem with the health of elasticsearch itself. Look in the elasticsearch.log for errors. If you need help with core elasticsearch, there's a section in this forum for that. See Elasticsearch - Discuss the Elastic Stack

Let's focus on the underlying issue.

The split feature works on "aggregatable" fields. These include things like integers, IP addresses and a few others. For fields that hold strings they can be indexed two ways:

  • as type text, which means they are analyzed to support the free-text search features of Elasticsearch, but can NOT be used for aggregations
  • as type keyword which means they are always evaluated for their absolute value, which they are "aggregatable" and thus support aggregation-based queries, but NOT free-text search.

If no index mappings are provided (usually in the form of an index template) which specifies the types of all fields in the index, the default mappings will be used. The default mappings will index a string field as text, and index it a 2nd time as keyword appending .keyword to your field name.

NOTE: index mappings are also where you would specify fields containing IP addresses (which I also see in your data) as type ip.

So back to your original question... the reason that you only see the .keyword fields is that those are the only fields that are aggregatable. But this is likely not how you want it to work. Many of those fields do not make sense for free-text search and should be defined as keywords in the index mappings, without appending anything to the field names.

So what do you need to do? The answer is that you need to provide an index mapping, ideally via an index template, which correctly defines the data type of each field. Index templates can also be used to specify other index settings, such as the number of shards and replicas.

After your data is being indexed correctly, the ML features will work as you are expecting.

I tried to reinstall all clients I have with the same version of filebeat. And all fixed.

Thanks for helping me.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.