We are wondering regarding our search performance in our cold data nodes.
We have a use case of searching a week which is built of 7 indexes (index per day) each index has 38 shards 22GB each. The query timeout after 60 seconds. During these 60 seconds, the cold nodes reached a load of 13-15.
I have a few questions regarding best practices:
- Usually, the best practice is to allocate 50% of the memory to java heap. Is that also correct for cold data nodes that are not indexing any data?
- Will it be better to use machines that are memory optimize meaning fewer cores more RAM ratio??
- Is it better to use many small machines of a few large ones? e.g., 8 machines 8 cores 32GB mem or 32 machines with 2 cores 8GB (or 16GB according to the answer to the previous question)
- What is the max ratio between memory to the number of indexes/shards or data on disk
- Any other tips regarding cold search performance will be appreciated .
elasticsearch 6.2.3 cluster consists of:
- 3 master nodes (4 cores / 14gb memory / 7gb heap)
- 3 client nodes (same setup)
- 40 hot data nodes (8 cores / 64gb memory / 30.5gb heap and 1.4tb local ssd disks)
- 8 cold data nodes (8 cores / 32gb memory / 16gb heap and 5tb spinning disks).
Cluster contains ~14,000 primary shards (25,000 total active shards) spread across ~8,900 indexes
Currunlty in use
28TB in hot storage(SDD)
25TB in cold storage(HDD)
Thanks for your help!