Realistic Sizing for ElasticSearch

Hi,

I am new to this forum. I have been playing around a bit with ES. The thing that confuses me is that how to size a ES setup.
For example if we go with the official channels (with the SEs) they recommend to have hot nodes only with 6TB of storage. Hence for large amounts of data we have to maintain a huge number of nodes.

But when I talk with others who use this as opensource they seem to manage a whole lot of more data with a very small number of nodes.

Any ideas on how the sizing should be exactly handled?

BR
Rukmal

How much data you can store on a node is limited by heap. All (not closed or frozen) indices consume heap, that's where they get 6TB, but the number varies depending on your data. We have a hot/cold (more like warm/cold) setup, data is ingest to hot nodes and stays there 30 days, then moves to cold. Then we freeze indices after 60-90 days.

Define the hot/warm/cold time and data storage requirements. Hot/warm need to both hold the data and provide indexing and search CPU/heap. Cold is rarely searched, it's just a data dump. If you can freeze, you can store a lot more data.

This graph shows the size of open (thawed) indices in blue, as we approached 10Tb, the heap line flattens, that is frequent GC's (that's bad). Reducing thawed to 9Tb returns heap to a saw-tooth line with less frequent GC's. We moved to 7.x during this time and I think it seems to use less heap.

Note that the green line includes frozen indices in the total, but they don't use much heap. This graph is something I've done in Zabbix, it's not from Kibana.