Have I failed to find the relevant part of the documentation for Elasticsearch 7.17 and 8.4, or is it the case that Index Lifecycle Management does not provide a way to close indices? (And if it really doesn't provide a way to close indices, why doesn't it? I am aware of Curator and also aware of it's status.)
The question is why you need to close indices at all anymore.
This was a frequently used feature in order to keep the total open shard count below the "20 per gigabyte of heap" max that was preached as gospel since Elasticsearch 1.7/2.x. However, many of the reasons this was necessary have been reduced or outright fixed/removed since then.
In fact, in 8.3, the "20 shards per gigabyte of heap" guidance was officially replaced. The new guidance is here, but the paragraph addressing this particular bit is "Data nodes should have at least 1kB of heap per field per index, plus overheads":
The exact resource usage of each mapped field depends on its type, but a rule of thumb is to allow for approximately 1kB of heap overhead per mapped field per index held by each data node. You must also allow enough heap for Elasticsearch’s baseline usage as well as your workload such as indexing, searches and aggregations. 0.5GB of extra heap will suffice for many reasonable workloads, and you may need even less if your workload is very light while heavy workloads may require more.
For example, if a data node holds shards from 1000 indices, each containing 4000 mapped fields, then you should allow approximately 1000 Ă— 4000 Ă— 1kB = 4GB of heap for the fields and another 0.5GB of heap for its workload and other overheads, and therefore this node will need a heap size of at least 4.5GB.
Note that this rule defines the absolute maximum number of indices that a data node can manage, but does not guarantee the performance of searches or indexing involving this many indices. You must also ensure that your data nodes have adequate resources for your workload and that your overall sharding strategy meets all your performance requirements.
You can confirm the change on this page by pulling up the older documentation in the drop-down.
Historically we have closed indices because the cluster, which was originally built with 2.x, was demonstrably incapable of keeping them all open and we we didn't have the resources to make the cluster bigger/better. I can't remember when we last saw memory issues though.
I'm curious why this:
For example, if a data node holds shards from 1000 indices, each
containing 4000 mapped fields, then you should allow approximately 1000
Ă— 4000 Ă— 1kB = 4GB of heap for the fields and another 0.5GB of heap for
its workload and other overheads, and therefore this node will need a
heap size of at least 4.5GB.
does not mention of the number of documents. Is that not a factor in heap usage? It seems like an index containing 1,000,000,000 documents would require more heap than an index containing 1000 documents. Also how much does that apply to 7.17? The documentation for 7.17 has the "Aim for 20 shards of fewer GB of heap memory" advice. But ILM for Elasticsearch 7.17 doesn't provide a way to close indices.
The cluster is currently running 7.17 and the cold data nodes have 1275 shards each on them. (N.B. cold on our cluster means probably not written to, not whatever Elastic thinks Cold phase of ILM means.) Looking at one node I can see the shards belong to 1270 indices. 750 of the shards are part of open indices, so that's still way over the "20 shards or fewer per GB…". The other 575 shards are part of closed indices. (All the indices have primary and secondary shards on other nodes.) I haven't worked out how to get an exact document count for all those shards but doing crude maths with the totals for the cluster, those shards contain around 27 billion documents, about 16 billion in shards of open indices. I can't bothered finding how how many fields are in each index, I know it's not the same in every index and I know that in most cases it's much less than 4000. But go with 4000 as like the example and 1270 indices and 1270 × 4000 × 1kB = ~5.2GB of heap add 0.5GB and the node needs ~5.7GB of heap, which is an overestimate because there aren't 4000 fields per index. Looking at monitoring for that node for an hour, heap usage goes between ~5GB and ~23GB.
So apparently the server needs more than 5.7GB heap.
That 5-23GB heap usage is with about 40% of the shards on the index belonging to closed indices. Would that increase if all the shards belonged to open indices? With Elasticsearch 6 I would have said it definitely would because closed indices were ignored and invisible in most situations such as monitoring. But Elasticsearch 7.17 keeps track of closed indices.
Do I still need to close indices? I could remove the Curator tasks which close indices and see if nodes start to crash. But it would be preferable to have a good reason to expect that all indices could be kept open before trying to do it.
I'm curious why this does not mention of the number of documents. Is that not a factor in heap usage?
Document count doesn't matter as much as the mapped field count total per node does. That's all we're saying here.
The cluster is currently running 7.17 and the cold data nodes have 1275 shards each on them.
Yep. That's still a use-case for closing indices if you intend to keep them on disk but not searchable until you want them to be.
The only document count limit is really the strict, unalterable 2 billion document per shard limit.
Looking at monitoring for that node for an hour, heap usage goes between ~5GB and ~23GB. So apparently the server needs more than 5.7GB heap.
Not at all! This is an exceptionally healthy looking JVM pattern. Elasticsearch automatically triggers a garbage collection at 75%. So long as the amount freed by the garbage collection brings the total heap usage to 50% or below, your node is in a healthy state, JVM-wise. In your case, it's triggering at 75% usage and dropping to around 5GB, as you're pointing out. With a 31G heap, that's dropping well past 50% utilization to about 16.1% utilization. Your JVM is healthy and doing well.
That said, even if you had more shards open than the 20 per GB of heap limit, it wouldn't immediately be apparent from a chart like this. By default, only 10% of the heap is set aside for index caching, and if you exceed the recommended number of shards per GB of heap recommendation, it just spreads that 10% of heap memory thinner and thinner between the open shards until it affects your indexing and searching operations. If you're self-hosted, you can manually specify the amount of heap set aside for index caching, which can effectively allow you to have more open shards. This is advanced territory, and isn't recommended for the faint of heart.
As far as my personal recommendation goes, if you intend to keep this stuff on-disk and query as needed by opening closed indices, then yes, I recommend still closing them to keep the number of open shards per GB of heap under 20. You're playing with fire if you try to leave them all open (at least until you upgrade to 8.3+). More shards always means more threads required to search, so that's something to keep in mind. More threads to go through means slower results.
The shard count per node issue is also why we created our searchable snapshot technology. It allows your data to be on a cold tier (in ILM, that means fully cached snapshot, a single shard whose replica is the snapshot repository), and/or a frozen tier (a partially cached snapshot—effectively this is a data caching tier backed by the snapshot repository). This is how we opted to supersede these limitations and allow data to be queried as long as it's in a snapshot repository, with no need to re-open closed shards or any of that maintenance overhead.
Thanks, that's all very helpful. Hurrah for exceptionally health looking JVM patterns! The cluster is self hosted. And it's working just fine as it is so for now I think I'll leave heap allocation settings and number of open indices alone.
We're not ready to upgrade to 8 yet and according to Elastic Product End of Life Dates | Elastic have nearly 2 years to do so. And I've just realised that Curator can work alongside ILM if allow_ilm_indices: true
is set, so there's plenty of time to look at moving to ILM bit by bit for things other than closing before an upgrade to 8.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.