Elasticsearch data node sizing on Azure infrastructure


To host Elasticsearch data nodes we have come with the following sizing for the start and we will be adding more data nodes based on the actual required capacity.

Master nodes:

3xD12v2 (4vCores, 28 RAM)

Coordinator + Kibana:
2xD13v2 (8vCores, 56 RAM)

Data nodes:
10xD13v2 (8vCores, 56 RAM) with 6x1TB SSD disks

We have the following opetions for hosting ES data nodes:
Option 1: Ds13v2
8 vCores (without Hyperthread) Xeon E5-2673 v3
VM total IOPS uncached limitation: 25600
VM total IO throughput limitation: 384 MBps

Option 2: E8s_v3
8 vCores (4 physical cores with enabling hyperthreading) Xeon E5-2673 v4
VM total IOPS uncached limitation: 12800
VM total IO throughput limitation: 192 MBps

Q1: I would like to understand which option seems to be a better solution to host ES data nodes?

Q2: Generally speaking, I would like to understand which of the following parts is more important to size data nodes?

  • CPU
  • Disk IOPS or sequential throughput
    Based on application benchmarks to use Xeon E5 CPU there is a huge performance gain in very CPU intensive JVM based applications if we compare 4th generation with 3rd generation (about 50-60% performance gain). However, I am not sure how it works in hyperthreaded situation and how it works with a lower IOPS limitation. Which one acts more as a bottleneck? IO or CPU?

Q3: Is there any way we can estimate IOPS and sequential throughput based on the predicted indexing rate?

Q4: Is there any value if we use software Raid via Azure strip for 6 SSD nodes or it would be better to keep it as different paths for shards?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.