Index Lifecycle Management - closing indices

I'm curious why this does not mention of the number of documents. Is that not a factor in heap usage?

Document count doesn't matter as much as the mapped field count total per node does. That's all we're saying here.

The cluster is currently running 7.17 and the cold data nodes have 1275 shards each on them.

Yep. That's still a use-case for closing indices if you intend to keep them on disk but not searchable until you want them to be.

The only document count limit is really the strict, unalterable 2 billion document per shard limit.

Looking at monitoring for that node for an hour, heap usage goes between ~5GB and ~23GB. So apparently the server needs more than 5.7GB heap.

Not at all! This is an exceptionally healthy looking JVM pattern. Elasticsearch automatically triggers a garbage collection at 75%. So long as the amount freed by the garbage collection brings the total heap usage to 50% or below, your node is in a healthy state, JVM-wise. In your case, it's triggering at 75% usage and dropping to around 5GB, as you're pointing out. With a 31G heap, that's dropping well past 50% utilization to about 16.1% utilization. Your JVM is healthy and doing well.

That said, even if you had more shards open than the 20 per GB of heap limit, it wouldn't immediately be apparent from a chart like this. By default, only 10% of the heap is set aside for index caching, and if you exceed the recommended number of shards per GB of heap recommendation, it just spreads that 10% of heap memory thinner and thinner between the open shards until it affects your indexing and searching operations. If you're self-hosted, you can manually specify the amount of heap set aside for index caching, which can effectively allow you to have more open shards. This is advanced territory, and isn't recommended for the faint of heart.

As far as my personal recommendation goes, if you intend to keep this stuff on-disk and query as needed by opening closed indices, then yes, I recommend still closing them to keep the number of open shards per GB of heap under 20. You're playing with fire if you try to leave them all open (at least until you upgrade to 8.3+). More shards always means more threads required to search, so that's something to keep in mind. More threads to go through means slower results.

The shard count per node issue is also why we created our searchable snapshot technology. It allows your data to be on a cold tier (in ILM, that means fully cached snapshot, a single shard whose replica is the snapshot repository), and/or a frozen tier (a partially cached snapshot—effectively this is a data caching tier backed by the snapshot repository). This is how we opted to supersede these limitations and allow data to be queried as long as it's in a snapshot repository, with no need to re-open closed shards or any of that maintenance overhead.

1 Like