ML multi metric split filed only have keywork

quyennguyen · August 10, 2021, 7:46am

I'm just a beginner of ML so I didn't know why this happened in my ELK.
I want to create new job in multi metric so I following those step
Pick "Fail login" search index

Then pick multi metric -> time range "use full filebeat* data"

Pick file "high-count(event rate)"
But when I want to pick split field and influencersit they only have field.keyword

So when I create metric it has error

How can I fix this?
I know this happened because I used field.keywork, but I didn't have other choice.
What's happen with my log?

BenTrent · August 10, 2021, 11:36am

@quyennguyen, there is an advanced option to give a job its own index.

By default jobs share an index to reduce the overall cluster shard count. But, for situations like this (or for very large jobs), selecting "Use dedicated index" is a good idea.

quyennguyen · August 10, 2021, 2:25pm

It worked somehow, but seem like I didn't get the result

Do you know why I don't have fields but only have fields.keywork?

BenTrent · August 10, 2021, 2:46pm

@quyennguyen , I don't understand your question. Are you wondering why there are no influencer results?

If thats the case, could you provide the job and datafeed configurations (with sensitive information removed, if any exists in the configuration).

quyennguyen · August 10, 2021, 3:13pm

I'm wondering why in split field on have fields.keywork but not fields. Because when I did it the first time, it has system.auth.hostname but not system.auth.hostname.keywork, and I get the result I want. But I deleted it for some reason.

About job and datafeed config, I don't know where to get it, 'cause I really empty in this side. Can you show me how to get it send to you?

BenTrent · August 10, 2021, 6:21pm

It has keyword because you specified those values. keyword signifies a keyword field type. It means your text field is indexed as a keyword as well as a text type.

I am guessing it worked before because your data has somehow changed. Possibly not all the indices have they keyword field, or that field was added after data was already indexed.

To get the configs, you should be able to get them from the machine learning job management UI: Anomaly detection | Kibana Guide [8.11] | Elastic

Select the JSON tab.

quyennguyen · August 10, 2021, 7:07pm

OK. After created job with keyword field, I got this

{
  "job_id": "login_ssh_multi_metric",
  "job_type": "anomaly_detector",
  "job_version": "7.13.3",
  "create_time": 1628622270365,
  "finished_time": 1628622303319,
  "model_snapshot_id": "1628622302",
  "description": "",
  "analysis_config": {
    "bucket_span": "1m",
    "detectors": [
      {
        "detector_description": "high_count partitionfield=\"system.auth.hostname.keyword\"",
        "function": "high_count",
        "partition_field_name": "system.auth.hostname.keyword",
        "detector_index": 0
      }
    ],
    "influencers": [
      "system.auth.hostname.keyword",
      "system.auth.ssh.ip.keyword"
    ]
  },
  "analysis_limits": {
    "model_memory_limit": "11mb",
    "categorization_examples_limit": 4
  },
  "data_description": {
    "time_field": "@timestamp",
    "time_format": "epoch_ms"
  },
  "model_plot_config": {
    "enabled": false,
    "annotations_enabled": false
  },
  "model_snapshot_retention_days": 10,
  "daily_model_snapshot_retention_after_days": 1,
  "results_index_name": "custom-login_ssh_multi_metric",
  "allow_lazy_open": false,
  "data_counts": {
    "job_id": "login_ssh_multi_metric",
    "processed_record_count": 1685,
    "processed_field_count": 958,
    "input_bytes": 90377,
    "input_field_count": 958,
    "invalid_date_count": 0,
    "missing_field_count": 2412,
    "out_of_order_timestamp_count": 0,
    "empty_bucket_count": 32938,
    "sparse_bucket_count": 2,
    "bucket_count": 32968,
    "earliest_record_timestamp": 1626604733000,
    "latest_record_timestamp": 1628582785000,
    "last_data_time": 1628622272689,
    "latest_empty_bucket_timestamp": 1628582640000,
    "latest_sparse_bucket_timestamp": 1628529060000,
    "input_record_count": 1685,
    "log_time": 1628622272689,
    "latest_bucket_timestamp": 1628582760000
  },
  "model_size_stats": {
    "job_id": "login_ssh_multi_metric",
    "result_type": "model_size_stats",
    "model_bytes": 82740,
    "peak_model_bytes": 104736,
    "model_bytes_exceeded": 0,
    "model_bytes_memory_limit": 11534336,
    "total_by_field_count": 4,
    "total_over_field_count": 0,
    "total_partition_field_count": 3,
    "bucket_allocation_failures_count": 0,
    "memory_status": "ok",
    "assignment_memory_basis": "current_model_bytes",
    "categorized_doc_count": 0,
    "total_category_count": 0,
    "frequent_category_count": 0,
    "rare_category_count": 0,
    "dead_category_count": 0,
    "failed_category_count": 0,
    "categorization_status": "ok",
    "log_time": 1628622302104,
    "timestamp": 1628582700000
  },
  "forecasts_stats": {
    "total": 0,
    "forecasted_jobs": 0
  },
  "state": "closed",
  "timing_stats": {
    "job_id": "login_ssh_multi_metric",
    "bucket_count": 32968,
    "total_bucket_processing_time_ms": 18977.00000000005,
    "minimum_bucket_processing_time_ms": 0,
    "maximum_bucket_processing_time_ms": 532,
    "average_bucket_processing_time_ms": 0.5756187818490673,
    "exponential_average_bucket_processing_time_ms": 1.1737351400660319,
    "exponential_average_bucket_processing_time_per_hour_ms": 60.74121520456408
  },
  "datafeed_config": {
    "datafeed_id": "datafeed-login_ssh_multi_metric",
    "job_id": "login_ssh_multi_metric",
    "query_delay": "90505ms",
    "chunking_config": {
      "mode": "auto"
    },
    "indices_options": {
      "expand_wildcards": [
        "open"
      ],
      "ignore_unavailable": false,
      "allow_no_indices": true,
      "ignore_throttled": true
    },
    "query": {
      "bool": {
        "must": [
          {
            "match_all": {}
          }
        ],
        "filter": [
          {
            "match_phrase": {
              "system.auth.ssh.event": "Failed password"
            }
          }
        ],
        "must_not": []
      }
    },
    "indices": [
      "filebeat*"
    ],
    "scroll_size": 1000,
    "delayed_data_check_config": {
      "enabled": true
    },
    "state": "stopped",
    "timing_stats": {
      "job_id": "login_ssh_multi_metric",
      "search_count": 4,
      "bucket_count": 32968,
      "total_search_time_ms": 29,
      "average_search_time_per_bucket_ms": 0.0008796408638679932,
      "exponential_average_search_time_per_hour_ms": 20.478633887048332
    }
  }
}

The result didn't have value and system. auth.ssh.ip.

BenTrent · August 10, 2021, 7:34pm

The document values for system.auth.hostname.keyword and system.auth.ssh.ip.keyword where anomalies are occurring seem to be ""

I have just confirmed with my own data that two jobs, one looking at the text field and one looking at the keyword field resulted in the same anomalies and influencers. This is only the case if all your data has the keyword field.

quyennguyen · August 10, 2021, 8:18pm

I still can't get it, my English maybe not good enough to get your idea.

Do you mean this error happen because all my log has the keyword field?

What can I do to fix this?

quyennguyen · August 11, 2021, 8:12am

I just delete log in a few days and resolve the problem, but then it got some error with datafeed, do you know why this happened?

richcollier · August 11, 2021, 11:30am

That problem occurs when the datafeed cannot successfully execute a search against the index where the raw data to be analyzed resides. You might have a problem with the health of elasticsearch itself. Look in the elasticsearch.log for errors. If you need help with core elasticsearch, there's a section in this forum for that. See Elasticsearch - Discuss the Elastic Stack

rcowart · August 11, 2021, 2:30pm

Let's focus on the underlying issue.

The split feature works on "aggregatable" fields. These include things like integers, IP addresses and a few others. For fields that hold strings they can be indexed two ways:

as type text, which means they are analyzed to support the free-text search features of Elasticsearch, but can NOT be used for aggregations
as type keyword which means they are always evaluated for their absolute value, which they are "aggregatable" and thus support aggregation-based queries, but NOT free-text search.

If no index mappings are provided (usually in the form of an index template) which specifies the types of all fields in the index, the default mappings will be used. The default mappings will index a string field as text, and index it a 2nd time as keyword appending .keyword to your field name.

NOTE: index mappings are also where you would specify fields containing IP addresses (which I also see in your data) as type ip.

So back to your original question... the reason that you only see the .keyword fields is that those are the only fields that are aggregatable. But this is likely not how you want it to work. Many of those fields do not make sense for free-text search and should be defined as keywords in the index mappings, without appending anything to the field names.

So what do you need to do? The answer is that you need to provide an index mapping, ideally via an index template, which correctly defines the data type of each field. Index templates can also be used to specify other index settings, such as the number of shards and replicas.

After your data is being indexed correctly, the ML features will work as you are expecting.

quyennguyen · August 13, 2021, 9:38am

I tried to reinstall all clients I have with the same version of filebeat. And all fixed.

Thanks for helping me.

system · September 10, 2021, 9:38am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Some field is not showing in ML Elasticsearch elastic-stack-machine-learning	7	1263	January 19, 2018
Can not split by number fields in multi metric machine learning jobs Elasticsearch elastic-stack-machine-learning	4	1006	August 11, 2017
Custom Job Management Kibana elastic-stack-machine-learning	4	430	December 12, 2019
Multi-metric Job Kibana elastic-stack-machine-learning	6	733	May 10, 2019
Creating multi metric job can only use distinct count on IP Elasticsearch elastic-stack-machine-learning	8	1202	March 5, 2018

ML multi metric split filed only have keywork

Related topics