Shard recovery blocks updates to cluster state?

(Maxim Kropotov) #1

Hey, everyone!
Recently we needed to make a configuration change to the machines that host our Elasticsearch cluster.

We stopped the elasticsearch service on one of our machines (node1), changed its' config. During this clients continued to index data and perform cluster state updates normally. After restarting the elasticsearch service, we saw client requests time out while trying to perform a put-mapping request. IIRC, this continued for two or three minutes.

Here is an example of a failed request:

{"error":"RemoteTransportException[[Dreadknight][inet[/]][indices:admin/mapping/put]]; nested: ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping [LoggingEvent]) within 30s]; ","status":503}

At that time the pending tasks queue (http://node2:9200/_cluster/pending_tasks) looked like this. The put-mapping task is a task generated by our client app.

  "tasks" : [ {
    "insert_order" : 3907,
    "priority" : "URGENT",
    "source" : "shard-started ([elbalogs23-07-15][0], node[m_fFX4RBTmSXvobO2rUI1Q], [R], s[INITIALIZING]), reason [after recovery (replica) from node [[Dreadknight][HOwvbK5cS_ewV3fhm691pQ][node3][inet[/]]{master=true}]]",
    "executing" : true,
    "time_in_queue_millis" : 40098,
    "time_in_queue" : "40s"
  }, {
    "insert_order" : 3908,
    "priority" : "URGENT",
    "source" : "shard-started ([elbalogs29-07-15][4], node[m_fFX4RBTmSXvobO2rUI1Q], [R], s[INITIALIZING]), reason [after recovery (replica) from node [[Slapstick][HTcJhWIYRcugOq-ea4k9og][node2][inet[/]]{master=true}]]",
    "executing" : false,
    "time_in_queue_millis" : 40096,
    "time_in_queue" : "40s"
  }, {
    "insert_order" : 3910,
    "priority" : "HIGH",
    "source" : "put-mapping [LoggingEvent]",
    "executing" : false,
    "time_in_queue_millis" : 19077,
    "time_in_queue" : "19s"

I realize that we did not follow best practices for a rolling restart as described here (, but instead we have accidentally simulated an ungraceful node resart. It seems strange that a single node failure effectively blocked updates to the cluster state.

Could anyone help me understand the behavior we encountered?
What are shard-started tasks and what causes them to remain in the queue for a long time?
Is this the intended behavior?
Is there a way to mitigate this (an unexpected shutdown then restart)?

Thanks in advance:)

(system) #2