Could we support timeout mechanism for replica bulk request?

Assume that shard has 1 primary + 1 replica, if a bulk request goes to primary and finished the shard bulk write operation, then transfer the bulk request to replica node, if the replica node is a really slow node, for example may get stuck for several minutes (cpu or memory issue, node network ping is ok, node cannot be removed from cluster), then all the bulk operation requests would get stuck before the slow replica node done.

In a 100+ nodes cluster, each node has the same index shard, if we have a single slow node, the above case may slow down the whole cluster bulk operations.

Could we add timeout mechanism for the replica bulk request? For example, if a replica got timeout after like 30s, then make the shard failed, don't block primary shard bulk operation forever.
I have done some test, I modified code to sleep 10mins in replica write operation, we could find that curl request took 10mins+:

/_bulk?pretty" -H 'Content-Type: application/json' -d'
> { "index" : { "_index" : "replica_test", "_id" : "1" } }
> { "field1" : "value1" }
> { "index" : { "_index" : "replica_test", "_id" : "2" } }
> { "field1" : "value2" }
> { "index" : { "_index" : "replica_test", "_id" : "3" } }
> { "field1" : "value3" }
> '
{
  "took" : 600051,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "replica_test",
        "_type" : "_doc",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 2,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 2,
        "status" : 201
      }
    },

Related issue: Support timeout mechanism for replica bulk request. · Issue #90981 · elastic/elasticsearch · GitHub

Anyone could help?

You could have a timeout here but it doesn't really make sense. Timing out individual requests to a bad node is a very weak response. If the node is as unhealthy as you describe, it typically should be removed from the cluster entirely.

Therefore I think a better approach would be to improve the node-level health checks that remove unhealthy nodes from the cluster.

@DavidTurner Thanks for your reply. I agree with you that we need to improve node-level health checks. Currently we remove nodes only if they could not PING success, that's based on network and it's only a node heartbeat. Shall we need to add more async unhealth check mechanisms?

Today's checks also have a timeout, and they also ensure that the data path is writeable. But yes, one option would be to add more mechanisms within Elasticsearch here. There are some things that we can't really detect from within Elasticsearch (e.g. they need root or other privileges that Elasticsearch doesn't have, for instance looking at disk health via SMART metrics). Checks like that need to be built into your operating platform instead.

Thank you David. Could you please help to indicate which part of code on the github implements these checks?

The timeout on follower checks is applied here:

The writability check is here:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.