I am getting the following error when trying to start the datafeed of an ML Anomaly Detection Job:
mydatafeed failed to start
datafeed [mydatafeed] cannot retrieve data because no index matches datafeed's indices [foo.bar*, *:foo.bar*]
'See the full error' shows this:
{
"error": {
"root_cause": [
{
"type": "status_exception",
"reason": "datafeed [mydatafeed] cannot retrieve data because no index matches datafeed's indices [foo.bar*, *:foo.bar*]"
}
],
"type": "status_exception",
"reason": "datafeed [mydatafeed] cannot retrieve data because no index matches datafeed's indices [foo.bar*, *:foo.bar*]"
},
"status": 400
}
The setup is as follows, on v8.7.0
- one elastic cluster which we'll call the "primary" with Kibana and ML node(s)
- other linked trusted elastic clusters used for Cross Cluster Search (CCS)
I have several machine learning jobs running fine, but am trying to add a new one for a different set of indices
. It seems to me that the crucial difference with this new one is that currently there are no indices/datastreams matching foo.bar
on the "primary" elastic cluster.
These are hopefully the relevant bits of the datafeed_config
in the ML job:
"indices_options": {
"expand_wildcards": [
"open"
],
"ignore_unavailable": false,
"allow_no_indices": true,
"ignore_throttled": true
},
"query": {
...
},
"indices": [
"foo.bar*"
"*:foo.bar*"
],
"scroll_size": 1000,
"delayed_data_check_config": {
"enabled": true
}
I've tried removing foo.bar*
from the list of indices
so that the job config is just like this: (although I want to keep it there because new data might appear on the primary in future)
"indices": [
"*:foo.bar*"
],
but it fails in the same way.
My issue seems similar to [ML] Datafeed fails on missing indices, even with allow_no_indices set to true · Issue #62404 · elastic/elasticsearch · GitHub (and I do have allow_no_indices
as true
) but the difference here is that we have remote clusters involved, so there are actually indices available which the datafeed could start consuming.
The error I'm receiving seems to come from this code elasticsearch/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/datafeed/extractor/scroll/ScrollDataExtractorFactory.java at v8.7.0 · elastic/elasticsearch · GitHub
I found an interesting comment here elasticsearch/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/datafeed/DatafeedNodeSelector.java at v8.7.0 · elastic/elasticsearch · GitHub which suggests there is an intention to succeed in the case of remote indices. However in my case it seems to have got past this point because a node has been allocated: if I look in Kibana at /app/ml/jobs
expand the job row and look at 'Job messages' it says:
Opening job on node [instance-0000000090]
so it is failing at a later stage when it tries to actually consume the datafeed.
Is this a bug that I should report at GitHub - elastic/elasticsearch: Free and Open, Distributed, RESTful Search Engine do you think?
I noticed the code that produces the error is looking at a FieldCapabilitiesResponse
so perhaps in fact there is an (implicit) requirement that the cluster where the ML job runs has at least 1 index matching the indices
of the datafeed? (so that Field capabilities API | Elasticsearch Guide [8.12] | Elastic can return relevant data)
If so is this / should it be documented somewhere?
Is there any way around this given the setup described above? I don't want ML nodes on the remote clusters - even if I did choose one remote where foo.bar indices are present, it would only see its own indices, not those from other remotes.