Could we support timeout mechanism for replica bulk request？

howardhuang · October 19, 2022, 9:31am

Assume that shard has 1 primary + 1 replica, if a bulk request goes to primary and finished the shard bulk write operation, then transfer the bulk request to replica node, if the replica node is a really slow node, for example may get stuck for several minutes (cpu or memory issue, node network ping is ok, node cannot be removed from cluster), then all the bulk operation requests would get stuck before the slow replica node done.

In a 100+ nodes cluster, each node has the same index shard, if we have a single slow node, the above case may slow down the whole cluster bulk operations.

Could we add timeout mechanism for the replica bulk request? For example, if a replica got timeout after like 30s, then make the shard failed, don't block primary shard bulk operation forever.
I have done some test, I modified code to sleep 10mins in replica write operation, we could find that curl request took 10mins+:

/_bulk?pretty" -H 'Content-Type: application/json' -d'
> { "index" : { "_index" : "replica_test", "_id" : "1" } }
> { "field1" : "value1" }
> { "index" : { "_index" : "replica_test", "_id" : "2" } }
> { "field1" : "value2" }
> { "index" : { "_index" : "replica_test", "_id" : "3" } }
> { "field1" : "value3" }
> '
{
  "took" : 600051,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "replica_test",
        "_type" : "_doc",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 2,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 2,
        "status" : 201
      }
    },

Related issue: Support timeout mechanism for replica bulk request. · Issue #90981 · elastic/elasticsearch · GitHub

howardhuang · October 23, 2022, 2:48am

Anyone could help?

DavidTurner · October 24, 2022, 10:04am

You could have a timeout here but it doesn't really make sense. Timing out individual requests to a bad node is a very weak response. If the node is as unhealthy as you describe, it typically should be removed from the cluster entirely.

Therefore I think a better approach would be to improve the node-level health checks that remove unhealthy nodes from the cluster.

howardhuang · October 25, 2022, 3:09am

@DavidTurner Thanks for your reply. I agree with you that we need to improve node-level health checks. Currently we remove nodes only if they could not PING success, that's based on network and it's only a node heartbeat. Shall we need to add more async unhealth check mechanisms?

DavidTurner · October 25, 2022, 7:50am

Today's checks also have a timeout, and they also ensure that the data path is writeable. But yes, one option would be to add more mechanisms within Elasticsearch here. There are some things that we can't really detect from within Elasticsearch (e.g. they need root or other privileges that Elasticsearch doesn't have, for instance looking at disk health via SMART metrics). Checks like that need to be built into your operating platform instead.

howardhuang · October 25, 2022, 7:54am

Thank you David. Could you please help to indicate which part of code on the github implements these checks?

DavidTurner · October 26, 2022, 1:39am

The timeout on follower checks is applied here:

github.com

elastic/elasticsearch/blob/289533ba39933fde932eebf44d458faa2742e035/server/src/main/java/org/elasticsearch/cluster/coordination/FollowersChecker.java#L304-L304


      
          TransportRequestOptions.of(followerCheckTimeout, Type.PING),

The writability check is here:

github.com

elastic/elasticsearch/blob/26c1d33ca43af75ee41dbefeb21526fe4d496e8e/server/src/main/java/org/elasticsearch/monitor/fs/FsHealthService.java#L151-L193


      
          private void monitorFSHealth() {
              Set<Path> currentUnhealthyPaths = null;
              final Path[] paths;
              try {
                  paths = nodeEnv.nodeDataPaths();
              } catch (IllegalStateException e) {
                  logger.error("health check failed", e);
                  brokenLock = true;
                  return;
              }
          
          
    for (Path path : paths) {
                  final long executionStartTime = currentTimeMillisSupplier.getAsLong();
                  try {
                      if (Files.exists(path)) {
                          final Path tempDataPath = path.resolve(TEMP_FILE_NAME);
                          Files.deleteIfExists(tempDataPath);
                          try (OutputStream os = Files.newOutputStream(tempDataPath, StandardOpenOption.CREATE_NEW)) {
                              os.write(bytesToWrite);
                              IOUtils.fsync(tempDataPath, false);

This file has been truncated. show original

system · November 23, 2022, 1:40am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What is timeout to write to replica shard Elasticsearch	6	1698	April 24, 2019
Bulk timeout Elasticsearch	6	2261	October 4, 2017
Elasticsearch 6.3.0 doesn't retry on index replica bulk write failure Elasticsearch	2	1894	April 25, 2019
Killing 1 node causes hanging bulk requests Elasticsearch	4	812	January 11, 2018
Replication unnecessarily routing requests over other nodes + timeouts Elasticsearch	1	372	July 6, 2017

Could we support timeout mechanism for replica bulk request？

Related topics