Failed to index all results after [N] attempts. failure in bulk execution

Hello All,
I see the following error originated by one of the ML jobs
Could not find anything related on the www.
Any ideas?

[2020-07-21T14:33:29,745][WARN ][o.e.x.m.u.p.ResultsPersisterService] [NODE3] [job-name] failed to index after [21] attempts.
[2020-07-21T14:33:29,745][WARN ][o.e.x.m.d.DatafeedTimingStatsReporter] [NODE3] [job-name] failed to report datafeed timing stats
org.elasticsearch.ElasticsearchException: [job-name] failed to index all results after [21] attempts. failure in bulk execution:
[0]: index [.ml-anomalies-.write-job-name], type [_doc], id [job-name_datafeed_timing_stats], message [MapperParsingException[failed to parse field [exponential_average_calculation_context] of type [keyword] in document with id 'job-name_datafeed_timing_stats'. Preview of field's value: '{incremental_metric_value_ms=14.0, previous_exponential_average_ms=15891.0, latest_timestamp=1595222269221}']; nested: IllegalStateException[Can't get text on a START_OBJECT at 1:182];]
	at ~[x-pack-ml-7.6.2.jar:7.6.2]
	at ~[x-pack-ml-7.6.2.jar:7.6.2]
	at$Persistable.persist( ~[x-pack-ml-7.6.2.jar:7.6.2]
	at ~[x-pack-ml-7.6.2.jar:7.6.2]
	at [x-pack-ml-7.6.2.jar:7.6.2]
	at [x-pack-ml-7.6.2.jar:7.6.2]
	at [x-pack-ml-7.6.2.jar:7.6.2]
	at [x-pack-ml-7.6.2.jar:7.6.2]
	at [x-pack-ml-7.6.2.jar:7.6.2]
	at$Holder.executeRealTime( [x-pack-ml-7.6.2.jar:7.6.2]
	at$Holder.access$600( [x-pack-ml-7.6.2.jar:7.6.2]
	at$3.doRun( [x-pack-ml-7.6.2.jar:7.6.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun( [elasticsearch-7.6.2.jar:7.6.2]
	at [elasticsearch-7.6.2.jar:7.6.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker( [?:?]
	at java.util.concurrent.ThreadPoolExecutor$ [?:?]
	at [?:?]

The mappings on that job's results index (the index that the .ml-anomalies-.write-job-name alias points to) are wrong. They do not have the correct mapping for the exponential_average_calculation_context field.

We need to work out how that happened.

We added that field in 7.4.0. Have you recently upgraded from a version before 7.4.0 to 7.6.2? If so, are you running in Elastic Cloud or in your own cluster. If in your own cluster, did you do a rolling upgrade or a full cluster restart?

Is that job using a custom results index or the shared results index? (It's a setting in the job config.)

The other way this might have happened is if you restored a snapshot that included global metadata from a pre-7.4.0 version into your 7.6.2 cluster. Does that sound likely?

We try to update the mappings when you upgrade, but there must be a path we have missed. If you can help us work out which upgrade path fails to update the mappings then we will fix it in the next version.

The only way to get this job working now will be to clone it and choose a custom results index that doesn't exist at the moment. That should then get created with the correct mappings.

Hi @droberts195
I have updated a few weeks ago from 7.3.x to 7.6.x
Instead of chasing bugs, errors, missing settings, etc.
How about re-creating these ML jobs?
What are the implications?

If you just want to get up and running again and don't mind deleting and recreating jobs then what you need to do is this:

  1. Find out which index the .ml-anomalies-.write-job-name alias points to. For example, it might be .ml-anomalies-shared or .ml-anomalies-custom-job-name.
  2. Find out all the other aliases pointing at that results index. You can probably do this using _cat/aliases and grep.
  3. Those aliases will include the job names. Delete all the ML jobs that were using the affected results index.
  4. Delete the affected results index.
  5. Recreate the jobs.

If it's hard to remember the job definitions you could add a step 2.5 of cloning the affected jobs but editing them to use a different results index during the cloning process.

If the list of jobs is long and you don't want to delete them all after all then there would be an alternative solution that just involves closing them and using reindex to correct the mappings.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.