Thanks, David. It's really appreciated. I had read your previous post but wasn't quite sure if it was a similar situation.
Do you still suggest increasing cluster.publish.timeout
and/or cluster.join.timeout
as a temporary workaround until we upgrade?
I was aware that we have a suboptimal number of shards for several of our indices. But I did not realise that the number of indices has this much of an impact on writing the cluster state to disk.
I am a bit surprised that writing a few kB per index would take this much time and that my disks are too slow. Especially because the documentation says that "cluster state updates are typically published as diffs to the previous cluster state".