Freezing Indicies Doesn't Improve Cluster Performance

Currently working with a 4 node ELK cluster (7.6) with 10k indices, 20k Primary Shard, 12k replica shards.

To stabilise the cluster performance, I have attempted freezing and closing indices. so the number of open non-frozen indices is 160 primary + 160 replicas.

A rolling restart is currently taking 5+ hours per node and overall performance is very slow to search.

Is it expected that frozen indices do have impact on cluster performance, even if it is closed and frozen?

You have far too many shards, which leads to a large cluster state and lots of updates that need to be propagated. The default limit of 1000 shards per node is there for a reason so you should look to get below that. Searching a lot of sad mall shards can be a lot slower than querying the same amount of data distributed s as Ross fewer larger shards.

In recent versions the cluster keeps track of frozen and closed shards so closing or freezing does not reduce the size of the cluster state as much as it used to.

1 Like

Does the 1000 limit count include both open and closed shards?

Yes. It is worth noting that this limit is quite high in my opinion. You should ideally aim for less shards per node.

This is why I was confused before upgrading from 6.x to 7. closing indices improved performance, and I only kept a month of data open, and rolling restarts took minutes, is this a 7.x change?

I do not remember. Think it was late 6.x or 7.x.

How much data do all these shards hold? What is the specification of the nodes?

Started with daily indices with a 8 shard per index (1P + 1R per node), with 10's - 100's MB per day, obviously I now know that this not healthy for the cluster and in recent times we are operating 1 Primary Shard + 1 Replicas 10's GB per index, 32 GB of RAM per node. CPU is not under pressure.

I feel I just need to go back and reindex out the bad choices of the past into larger indices, and potentially increase the amount of nodes.

Also why so many shards - your data is not that large and why do you need to have a P&R shard on every node? Why is 1P/1R not enough as long as shards stay under 50GB or so?

Separately, I'd think closed indexes would not affect updates or performance significantly. Also for rolling restarts make sure you set your delayed reallocation on node loss (node_left.delayed_timeout) high enough to not move shards on restart; default is 1m; and you can see if things move in cluster/health - we use 15-60m.

it's only historic data that is like that, so just need to go back and reindex it.

The indices are frozen and closed, the overall system performance (ie taking several hours to re-add a node), feels as though they are still being treated as open/frozen.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.