Elasticsearch index creation high master timeout causing index creation retries internally

Hello everyone

I am facing a weird problem with elasticsearch. I am running ES 1.4.1 on 32G (16G heap) data nodes(19) along with 64G (32G heap) master nodes(3). My cluster metadata has grown too much causing all master operations to be slow. Even a 64G master is not able to cope up well. Due to this any new index creations are failing very frequently.

I am trying to forcibly create these indices by giving a very high master_timeout (10000 sec). My request is returning, but I am seeing index creation failure logs on master. As soon as an attempt to create the index fails, I see another task in master's pending_tasks to create the same index. This keeps happening till the master_timeout value.

Is there an internal retry mechanism in elasticsearch, which is kicking in here? I was under the impression that one task will be tried only once and if failed, will return a failure.

This retry mechanism is queueing up tasks on master affecting my overall system performance. Is there a way to flush the master pending tasks or to disable this index creation retry? Has anyone faced such a scenario earlier?

Any info on this will be greatly helpful. Looking forward to the support from the ES community.

Akshay Goel

  1. Upgrade
  2. Fix the massive cluster state problems

Anything else is just going to waste your time.