Hi
One of our production clusters has 40 hot data nodes (8 cores, 64gb ram out of which 30.5gb heap and 1.5tb of local SSD storage), and 30 warm data nodes (8 cores, 32gb ram out of which 16gb heap and 5tb of attached HDD storage). Recently, as the warm data nodes have been filled up with more data and crossed the 1.6tb of used storage, we have begun seeing an interesting pattern: the java heap usage circulates around 14-16gb, CPU is around 15% and a lot of time is invested in JVM GC collection-old (almost none on collection-young).
As an experiment, we've allocated 24gb to the heap on one of the warm nodes (75% of the total memory), and let the cluster initialize and rebalance the unassigned shards. The results were remarkable - java heap usage returned to ~14gb, CPU dropped to ~2% and JVM collection dropped to 0 on collection-old and collection-young is ~1/3 of what collection-old used to be before increasing the jvm heap.
The following graphs describe the behavior above, where at 12:07 the settings were changed:
As discussed on Cold data node search performance, more than 50% of the ram should be allocated to warm nodes' jvm heap.
Does this setup make sense?
What are the penalties of allocating 75% of the total memory to the jvm heap?
What other concerns should we have with such a setup? e.g. query performance, cluster maintenance (deleting indices / moving shards to and from the warm nodes / changing indices settings etc.)
Your input is appreciated