Elasticsearch Topology Design

Hello Elasticsearch big-brain experts,

I have been testing out Elasticsearch as a log vaulting, auditing tool, and SEIM for the last few months and my company has authorized the purchase of hardware to build a cluster. I have gone through a lot of documentation and put together a proposal but would like to pass my design to the community to see if I have any glaring flaws or misunderstandings. I have spec'd out 3 types of servers below:

Hot Storage:

  • 2x 8 core Intel 3.1ghz procs
  • 128GB ECC Registered Memory
  • 1x 128GB m.2 (for OS)
  • 16x 2.8GB Hot-Swap NVME SSD's
  • 2x 10Gbase-T LAN on Motherboard
  • 1x 2 port 1Gbase-T PCI card (for management)

Warm Storage:

  • 2x 8 core Intel 3.1ghz procs
  • 128GB ECC Registered Memory
  • 12x 4GB HDD's in RAID 5
  • 1x 128GB m.2 (for OS)
  • 2x 10Gbase-T LAN on Motherboard
  • 2x 1Gbase-T PCI card (for management)

Auxiliary Servers:

  • 2x 8 core Intel 3.1ghz procs
  • 128GB ECC Registered Memory
  • 8x 250GB SSD's
  • 1x 128GB m.2 (for OS)
  • 2x 10Gbase-T LAN on Motherboard
  • 2x 1Gbase-T PCI card (for management)

My intention is to purchase 4 hot storage servers, 4 warm storage servers, and 3 auxiliary servers. The hot and warm servers will make up the data nodes as well as masters. I plan to have indices on the hot storage have 2 shards with 1 replica allowing for 2 servers-worth of fault tolerance. After 60 or 90 days I plan to have indexes move to the warm hosts where they will go down to 1 shard with a replica.

The auxiliary servers I plan to use as a client node (data=false master=false) to speed up searches, a dedicated parser for logstash, filebeat, etc., and a kibana front-end.

All servers will all be on the same 1Gbase-T switch with 2 lines each configured with an 802.3ad LACP bond for data and cluster communication, and a separate 1G connection to another switch for management (ssh, scp, configuration management, etc). I have forgone dedicated master nodes because I read that it isn't really required until you get over a 10 node cluster but I'm not married to this idea if it is wrong.

Please let me know if there is anything that I have missed or if this configuration is a substandard or ill advised. I do want to know where my baby is ugly on this one because I don't want to waste the money or cycles on fixing my configuration.

Or just tell me that I did a great job and should move forward. My ego can always use a boost. :slight_smile:

A few things;

Yea I mistyped. The storage is in 16x 1.8TB for the hot nodes and 12x 4TB HDD on RAID 5 for the warm nodes. This should give me around 73 TB for hot storage and 132 TB (I think) for warm storage ((4*12-4)4 - (412-4)). Sorry about that.

I would like to know what the benefit would be to running multiple docker images per host instead of just assigning 1/2 the memory of the host (64GB) to run one instance. That is what I have seen in documentation so far. Just specifying my thought processes and not trying to argue anything.

As far as the auxiliary nodes go, I don't really have a reference point to determine how much in the way of resources a client node is going to consume. Per my understanding I will be pointing Kibana and any log sources at the client node to either start indexing or perform searches. So if anyone has any guidelines as to how big I need to have that in relation to the data/master nodes that would be great. I have some idea in the way of a dedicated parser. I threw some netflow from a couple core routers at a really beefy VM and that beat the heck out of the CPU and memory so I would like that to stay on some pretty beefy hardware because I'm going to be throwing a lot more at it than what I gave the test/dev/qa instance.

I am pretty sure that the Kibana hardware is way more than is needed but limiting the hardware configurations made it easier to build a menu so to speak of hardware needed for expansion if ops or dev or infrastructure wanted to get in on the benefits that ES can give them. I probably won't need to upgrade that for a long time but if someone wants to start onboarding non-security stuff into the cluster I have a price that I can give them for upgrading the storage nodes depending on their log retention period, or creating dedicated master nodes, or increasing client nodes and putting them behind a load balancer. It just gives me some standards that I can hand out easily with quotes.

We recommend <32GB per heap, so multiple nodes means best resource usage.

I see. So breaking the 8 64GB hosts into 16 32GB hosts will provide better resource utilization. I'm going to look into that then.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.