Version 6.8 (Hoping it is fixed in later versions, but haven't found anything in the release notes)
Working with a new team that has just started sending new data in, stored in daily indices. The new data includes a custom alternate date field that is only mapped in the later indices. Created an ML datafeed using this alternate date field as the time_field. Used a wildcard for the indices pattern that included all daily indices. Datafeed failed with the following message:
Datafeed is encountering errors extracting data: [] Search request returned shard failures; first failure: shard [[Wxg34k7CQRCsfRGM-7q5zg][][0]], reason [RemoteTransportException[[][][indices:data/read/search[phase/query]]]; nested: QueryShardException[No mapping found for [alternate.date] in order to sort on]; ];
The datafeed works if I specify only the newer indices in the config, but doing this would entail continuously cloning and rerunning the job until the field is mapped in all indices.
Either that, or just send a dummy transaction to each of the older indices (they are still open) so that each one has a document with that field. The problem with that strategy is the possibility of future failures if we ever get another day with no documents with that particular field.
We talked with Jimmie Clinton in Elastic Support and are looking at a few other options. Some of the factors that complicate any solution:
The documents (500+ per day) are buried in daily indices with 25M docs. We do not have the capability to change the ingest and destination index of these few documents.
We don't have a guarantee that we will get one of the target docs each day.
Could we use Data Frames to separate these target docs into their own index and then run the ML job on that index? Would the Data Frames have the same problem with the missing date field?
The datafeed is firing a search against elasticsearch asking for the data to be sorted on the time field. In such scenarios, ES's behaviour is to return failures for indices that do not contain mappings for the field to sort on. When the datafeed is observing failures in the search request, it fails and reports the failure back.
One workaround I can think of is to add mappings for the time field to the indices that match the index pattern and don't have such mapping.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.