"could not load fields from index" in datafeed while creating a ML job

Hi,

I am trying to create a ML job with Advanced Job Creation. The thing is I get an error from datafeed tab that it "Could not load fields from index". I wonder what exactly is going wrong. For instance, if I go to Data Preview tab, I could see the fields in the indices. And also I was able to create a ML job without clicking move to advanced job configuration.

For your inforamation, I am using 7.4.2 for Elasticsearch, Logstash, Kibana. And the _source field for these indices are all enabled.

Does anyone have an idea what I should do here? Oh yes, I will try creating a Job via Python client.

Have a look at the following error messages I got from Kibana.

image
image

Thanks,

I'd need to know more about:

  1. What the actual detector configuration is (i.e. what fields do you expect to analyze in the ML job)
  2. What the mapping of the index you want to analyze looks like and perhaps a sample document in that index

I don't want to jump to conclusions here, but explicitly setting the index to test-2020-01.01 is probably NOT what you really want to do as this appears to be only 1 day's worth of data. You would normally create an index pattern (such as test-* that will match all timestamp-named indices and ML will handle the proper querying of the data in a chronological way over time)

Hi @richcollier,

Thanks for the reply. Let me go through one by one.

  1. What the actual detector configuration is (i.e. what fields do you expect to analyze in the ML job)
"detectors": [
      {
        "function": "high_count",
        "detector_description": "high_count over request_header_forwarded_for",
        "over_field_name": "request_header_forwarded_for"
      }
    ]
  1. What the mapping of the index you want to analyze looks like and perhaps a sample document in that index

Since I'm using templates for these indices, let me paste index template.

GET /_template/test
{
  "test" : {
    "order" : 0,
    "index_patterns" : [
      "test-*"
    ],
    "settings" : {
      "index" : {
        "number_of_shards" : "2"
      }
    },
    "mappings" : {
      "properties" : {
        "request_header_forwarded_for" : {
          "type" : "keyword"
        },
        "http_status_code" : {
          "type" : "keyword"
        },
        "request_header_user_agent" : {
          "type" : "keyword"
        },
        "http_verb" : {
          "type" : "keyword"
        }
      }
    },
    "aliases" : { }
  }
}

I don't want to jump to conclusions here, but explicitly setting the index to test-2020-01.01 is probably NOT what you really want to do as this appears to be only 1 day's worth of data. You would normally create an index pattern (such as test-* that will match all timestamp-named indices and ML will handle the proper querying of the data in a chronological way over time)

Oh yeah I'm not using test-2020-01.01 in Kibana. I'm using proper index pattern as you mentioned. It was just a mistake while pasting here.

Hope this gives you more context on what I'm trying to do. Let me know if you need more information.

Thanks

Your config looks sensible.

Apparently that error can occur if the index is either empty or if the user who is configuring the job doesn't have permissions to read that index.

Validate that this is not the case and also what if anything do you see if you select the "Data Preview" tab? What errors, if any get logged to elasticsearch.log at that time?

To answer your questions one by one,

Aparently that error can occur if the index is either empty

Indices in test-* index pattern are test-2020.01.01, test-2020.01.02 and so on.

GET /test-*/_count?
{
  "count" : 8462636,
  "_shards" : {
    "total" : 14,
    "successful" : 14,
    "skipped" : 0,
    "failed" : 0
  }
}

if the user who is configuring the job doesn't have permissions to read that index.

The user I am using has some roles attached to it.

  • machine_learning_user
  • machine_learning_admin
  • kibana_user
  • log_reader
  • test (custom role we added)
    • Cluster privileges : monitor_ml, manage_index_templates, manage
    • Run As privileges : None
    • Index privileges
      • indices : test-*
      • privileges : all

Validate that this is not the case

From the details above, I think I can conclude the indices are not empty and that I have right permissions.

what if anything do you see if you select the "Data Preview" tab?

Yeah I see the documents with the fields I need. For your information, I had to scrub id and request_header_forwareded_for fields. They do contain legitimate values though.

[
  {
    "_index": "test-2020.02.05",
    "_type": "_doc",
    "_id": "",
    "_score": 2,
    "_source": {
      "request_header_forwarded_for": "",
      "@timestamp": "2020-02-05T00:27:38.381Z"
    }
  },
  {
    "_index": "test-2020.02.05",
    "_type": "_doc",
    "_id": "",
    "_score": 2,
    "_source": {
      "request_header_forwarded_for": "",
      "@timestamp": "2020-02-05T00:27:34.719Z"
    }
  }
⋮
]

What errors, if any get logged to elasticsearch.log at that time?

What I got while reproducing the error is as follows.

Feb 25, 2020, 12:34:35 PM UTC
WARN
i2@eu-west-1c
[instance-0000000002] [GET /_xpack/ml/anomaly_detectors/_stats] is deprecated! Use [GET /_ml/anomaly_detectors/_stats] instead.
Feb 25, 2020, 12:34:35 PM UTC

WARN
i5@eu-west-1a
[instance-0000000005] [GET /_xpack/ml/anomaly_detectors] is deprecated! Use [GET /_ml/anomaly_detectors] instead.
Feb 25, 2020, 12:34:04 PM UTC

WARN
i1@eu-west-1c
[instance-0000000001] [GET /_xpack/ml/anomaly_detectors/_stats] is deprecated! Use [GET /_ml/anomaly_detectors/_stats] instead.
Feb 25, 2020, 12:34:04 PM UTC

WARN
i2@eu-west-1c
[instance-0000000002] [GET /_xpack/ml/info] is deprecated! Use [GET /_ml/info] instead.

Hmm...those are just informational messages in the log and shouldn't impede the operation.

I am a bit curious about the user's role. Have you, perhaps, tried to repeat what you're doing with a different user? Perhaps the elastic superuser? Just for comparison... Perhaps there is some permissions clash going on

Unfortunately superuser didn't help. I will try creating a ML job with Python client to see if I can bypass this error. Please do let me know if you have any other idea.

Thanks

@richcollier I was able to bypass the error above by creating a ML job and plugging in datafeed via REST API calls. So this is no longer a blocker on my side. Still, I would appreciate if we can find a way to do it from Kibana UI.

Thanks,

Gee

The advanced wizard uses field caps to determine the fields.
Would it be possible to call GET /test-*/_field_caps and paste the results here?
Also, are there any errors in the browser console when using that page?

Hi @James_Gowdy, I'll answer your questions one by one.

Would it be possible to call GET /test-*/_field_caps and paste the results here?

GET /test-*/_field_caps
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "specified fields can't be null or empty"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "specified fields can't be null or empty"
  },
  "status": 400
}

If I specify fields, however, I got valid response though.

GET /test-*/_field_caps?fields=request_header_forwarded_for,request_haproxy_acl,http_status_code,http_verb,request_header_user_agent
{
  "indices" : [
    "test-2020.01.01",
    "test-2020.01.02",
    "test-2020.01.03",
    "test-2020.01.04",
    "test-2020.01.05",
    "test-2020.01.06",
            ⋮
  ],
  "fields" : {
    "request_header_forwarded_for" : {
      "keyword" : {
        "type" : "keyword",
        "searchable" : true,
        "aggregatable" : true
      }
    },
    "http_status_code" : {
      "keyword" : {
        "type" : "keyword",
        "searchable" : true,
        "aggregatable" : true
      }
    },
    "request_header_user_agent" : {
      "keyword" : {
        "type" : "keyword",
        "searchable" : true,
        "aggregatable" : true
      }
    },
    "http_verb" : {
      "keyword" : {
        "type" : "keyword",
        "searchable" : true,
        "aggregatable" : true
      }
    }
  }
}

Also, are there any errors in the browser console when using that page?

One thing I found was

Refused to execute inline script because it violates the following 
Content Security Policy directive: "script-src 'unsafe-eval' 'self'". 
Either the 'unsafe-inline' keyword, a hash ('sha256-...'), or a 
nonce ('nonce-...') is required to enable inline execution.

Hope this helps.

Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.