Hi,
To host Elasticsearch data nodes we have come with the following sizing for the start and we will be adding more data nodes based on the actual required capacity.
Master nodes:
3xD12v2 (4vCores, 28 RAM)
Coordinator + Kibana:
2xD13v2 (8vCores, 56 RAM)
Data nodes:
10xD13v2 (8vCores, 56 RAM) with 6x1TB SSD disks
We have the following opetions for hosting ES data nodes:
Option 1: Ds13v2
8 vCores (without Hyperthread) Xeon E5-2673 v3
RAM 56 GB
VM total IOPS uncached limitation: 25600
VM total IO throughput limitation: 384 MBps
Option 2: E8s_v3
8 vCores (4 physical cores with enabling hyperthreading) Xeon E5-2673 v4
RAM 64 GB
VM total IOPS uncached limitation: 12800
VM total IO throughput limitation: 192 MBps
Q1: I would like to understand which option seems to be a better solution to host ES data nodes?
Q2: Generally speaking, I would like to understand which of the following parts is more important to size data nodes?
- CPU
- Disk IOPS or sequential throughput
Based on application benchmarks to use Xeon E5 CPU there is a huge performance gain in very CPU intensive JVM based applications if we compare 4th generation with 3rd generation (about 50-60% performance gain). However, I am not sure how it works in hyperthreaded situation and how it works with a lower IOPS limitation. Which one acts more as a bottleneck? IO or CPU?
Q3: Is there any way we can estimate IOPS and sequential throughput based on the predicted indexing rate?
Q4: Is there any value if we use software Raid via Azure strip for 6 SSD nodes or it would be better to keep it as different paths for shards?