Indexing is throttled to all indicies while opening a closed index?


I could be entirely wrong here, but my observation seems to be that while an index is being opened no other indices can be actively indexed.

We have a lot of indices in our cluster and keep the majority of them closed, only opening the index if it needs to be updated or if someone wants to start searching on it.

We have noticed that if a lot of open calls happen simultaneously or really close to each other, then our indexing jobs start throwing these exceptions:

ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping [response]) within 30s]

But the nodes themselves show no signs of stress. Search queries all execute quickly throughout it all, server/jvm metrics are all normal, disk I/O is fine, etc. But the indexing jobs start to hit 30s timeouts.

We found we can recreate this issue easily by closing a bunch of indices and putting them in a bash for loop to open them. If I put a 2 second sleep between each open call then the indexing jobs can complete without error. But with a 1 second delay or less, it hits 30s timeouts again.

Is there anything that I can do configuration-wise to remove this indexing throttle while other indices are being opened on the cluster? Or am I missing something else entirely here?

Thanks for any help!

I did some more investigation and found something that might be related.

  • Cluster state tasks are executed in a single thread.
  • Mapping updates from indexing jobs push tasks to the pending_tasks queue with a priority of "HIGH"
  • Opening a closed index pushes tasks to the pending_tasks queue with a priority of "URGENT"

So my guess here is that too many "URGENTS" in this queue makes it so the "HIGHS" take too long to get to.

Is this accurate? And is any of this configurable? Can/Should we make the "shard-started" tasks equal in priority to the "put-mapping" tasks?