Datafeed fails if time_field is not mapped in all indices

cosruss · November 15, 2019, 8:37pm

Version 6.8 (Hoping it is fixed in later versions, but haven't found anything in the release notes)

Working with a new team that has just started sending new data in, stored in daily indices. The new data includes a custom alternate date field that is only mapped in the later indices. Created an ML datafeed using this alternate date field as the time_field. Used a wildcard for the indices pattern that included all daily indices. Datafeed failed with the following message:

Datafeed is encountering errors extracting data: [] Search request returned shard failures; first failure: shard [[Wxg34k7CQRCsfRGM-7q5zg][][0]], reason [RemoteTransportException[[][][indices:data/read/search[phase/query]]]; nested: QueryShardException[No mapping found for [alternate.date] in order to sort on]; ];

The datafeed works if I specify only the newer indices in the config, but doing this would entail continuously cloning and rerunning the job until the field is mapped in all indices.

richcollier · November 20, 2019, 7:42pm

Perhaps you can introduce a field alias in the older indices so that has the same name as the time field in the new indices...

cosruss · December 4, 2019, 8:52pm

Either that, or just send a dummy transaction to each of the older indices (they are still open) so that each one has a document with that field. The problem with that strategy is the possibility of future failures if we ever get another day with no documents with that particular field.

We talked with Jimmie Clinton in Elastic Support and are looking at a few other options. Some of the factors that complicate any solution:

The documents (500+ per day) are buried in daily indices with 25M docs. We do not have the capability to change the ingest and destination index of these few documents.
We don't have a guarantee that we will get one of the target docs each day.

Could we use Data Frames to separate these target docs into their own index and then run the ML job on that index? Would the Data Frames have the same problem with the missing date field?

dmitri · December 9, 2019, 4:42pm

The datafeed is firing a search against elasticsearch asking for the data to be sorted on the time field. In such scenarios, ES's behaviour is to return failures for indices that do not contain mappings for the field to sort on. When the datafeed is observing failures in the search request, it fails and reports the failure back.

One workaround I can think of is to add mappings for the time field to the indices that match the index pattern and don't have such mapping.

system · January 6, 2020, 4:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Datafeed error - No mapping found for [timestamp] in order to sort on Elasticsearch elastic-stack-machine-learning	3	749	March 21, 2019
Anomaly detection - Elastic Jobs failing to start SIEM elastic-stack-machine-learning	3	795	March 20, 2020
ML Multi-Metric query fails when similar Single-Metric is OK Elasticsearch elastic-stack-machine-learning	14	1632	October 8, 2017
Shards failure error with unmapped data (string type) Kibana	2	943	August 24, 2017
When indices only exist on remote cluster: datafeed [xxxx] cannot retrieve data because no index matches datafeed's indices Elasticsearch elastic-stack-machine-learning	2	125	April 17, 2024

Datafeed fails if time_field is not mapped in all indices

Related topics