"failed to process cluster event (put-mapping) within 30s" for first few requests to ingest into new index


(Mark Ryall) #1

Hi Everyone!

We are ingesting documents into a new elasticsearch index using 40 separate processes.

We're experiencing an issue which seems to only occur during the first few requests. We receive the following remote error from a bulk insert request:

{
  "type"=>"process_cluster_event_timeout_exception",
  "reason"=>"failed to process cluster event (put-mapping) within 30s"
}

We subsequently recover by retrying the batch insert but we would like to understand what's going on and ideally avoid the problem entirely.

At this stage, there are only two indexes (the old index and the new) with 3 shards in each index, the eventual size is around 6Gb with around 20 million documents.

Each of the 40 processes attempt to create the index and then insert and the index creation uses a dynamic template we have configured in the cluster.

I'm assuming the problem is that there are multiple processes triggering the index creation simultaneously and the index has not fully initialized before we immediately attempt to bulk insert documents. Does this sound feasible?

Does anyone happen to know how the problem might be avoided?


(Christian Dahlqvist) #2

Are you explicitly creating the index or is it created using an index template when you index into it? How many indices and shards do you have in total in the cluster?


(Mark Ryall) #3

Hi Christian,

Currently we are explicitly creating the index (although the mapping comes from a template) before ingesting. Would this problem be less likely to occur if we just let the bulk request create the indexes?

There are at most 3 indexes, 9 primary shards and 9 replica shards.

It seems this problem occurs less often if we restrict the batch size for the bulk insert to 1,000 (previously some requests would include up to around 5,000 documents).


(Christian Dahlqvist) #4

That is unusual. I have typically only seen this when there is a large number of shards in the cluster that leads to a large cluster state that is slow to update.

Which version of Elasticsearch are you on?


(Mark Ryall) #5

We are using 6.4 - but plan to upgrade to 6.5 soon


(Christian Dahlqvist) #6

What does the cluster health and cluster state APIs give?