Cluster takes too long to apply cluster state

Hi guys,

We've added a few nodes to spread the disk load. Cluster stayed yellow but we still got 4 nodes disconnect from the cluster during the index deletion:

insertOrder timeInQueue priority  source
       7316       21.3s IMMEDIATE node-left
       7317       21.3s IMMEDIATE node-left
       7318       21.3s IMMEDIATE node-left
       7319       21.3s IMMEDIATE node-left
       7325       19.3s URGENT    node-join
       7321       21.3s HIGH      shard-failed
       7322       21.3s HIGH      shard-failed
       7323       20.8s HIGH      shard-failed
       7324       20.8s HIGH      shard-failed
       7320       21.3s HIGH      shard-failed

[2023-05-25T00:47:35,421][WARN ][o.e.c.c.LagDetector      ] [esm04] node [{esd02}{nIoZq1ZWRiKgPBz3x6uJAg}{BbJGSC4zRv2ID0hEfXghGw}{x.x.x.x:9300}{cdfhstw}{xpack.installed=true, transform.node=true}] is lagging at cluster state version [13093], although publication of cluster state version [13094] completed [1.5m] ago
[2023-05-25T00:47:35,422][WARN ][o.e.c.c.LagDetector      ] [esm04] node [{esd03}{mDYiwqFkS-Sj7A9YcyLmrA}{L2ZdyuXhTr6Mh9vbk8Acjg}{x.x.x.x:9300}{cdfhstw}{xpack.installed=true, transform.node=true}] is lagging at cluster state version [13093], although publication of cluster state version [13094] completed [1.5m] ago
[2023-05-25T00:47:35,422][WARN ][o.e.c.c.LagDetector      ] [esm04] node [{esd08}{T83ju1TKQhyZUd2LI4Atlw}{R7tciuZQQQadTA3TFIgWCA}{x.x.x.x:9300}{cdfhstw}{xpack.installed=true, transform.node=true}] is lagging at cluster state version [13093], although publication of cluster state version [13094] completed [1.5m] ago
[2023-05-25T00:47:35,423][WARN ][o.e.c.c.LagDetector      ] [esm04] node [{esd06}{ReFWrVXVSf-a1ould6uIEg}{TpJGLw5XQQe3Cen3lIVxIQ}{x.x.x.x:9300}{cdfhstw}{xpack.installed=true, transform.node=true}] is lagging at cluster state version [13093], although publication of cluster state version [13094] completed [1.5m] ago

Nodes immediately rejoined but we got a bunch of UNASSIGNED & INITIALIZING shards in between and YELLOW cluster state, which can go to RED if removed nodes would take out enough shards to cause outage.

Is it safe to bump node_left.delayed_timeout to ~5 minutes to prevent master kicking them out during the deletion operation? I realize that getting faster drives/more instances can speed up the process but we might not have this option.

One more question, is this SENT_APPLY_COMMIT operation asynchronous? What happens if we wait for a couple of nodes to apply cluster state? Do they still can accept writes or only reads? Or they aren't servicing the data until they reported the last cluster state change successfully?

It is safe to increase this timeout indeed, but this parameter will not prevent the master from removing nodes that are lagging so badly.

Yes.

So this timeout has no effect on whether master will kick node out of the cluster? I thought since deletion is asynchronous other writes/reads to ES shouldn't be affected when it's done?

What would be a usual suspect for this type of lag? Our ssds still can take a lot of writes/reads in parallel, sometimes this lag is triggered when deleted index is dropping only 1-2 15Gb shards from each data node.

Usually it's either infrastructure problems or a bug. The troubleshooting docs (linked previously) will help you collect the information needed to distinguish these cases.

Hi David,

But isn't it also expected that deleting 1Tb+ of data can take ~2 minutes? Can we speed up the deletion somehow by changing ? Is it possible that we are missing some throttle settings, which actually make the deletion slower, not the disk speed?

Is it expected for the long index deletion to block updating cluster state at all?

Yes it might take some time if there's a lot of data to delete. But minutes to delete 1-2 small shards (as per your earlier post) seems surprising. I cannot recommend changing any settings without seeing the results of the troubleshooting I linked previously.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.