Elasticsearch Topology Design

InnerJoin · February 18, 2021, 2:08am

Hello Elasticsearch big-brain experts,

I have been testing out Elasticsearch as a log vaulting, auditing tool, and SEIM for the last few months and my company has authorized the purchase of hardware to build a cluster. I have gone through a lot of documentation and put together a proposal but would like to pass my design to the community to see if I have any glaring flaws or misunderstandings. I have spec'd out 3 types of servers below:

Hot Storage:

2x 8 core Intel 3.1ghz procs
128GB ECC Registered Memory
1x 128GB m.2 (for OS)
16x 2.8GB Hot-Swap NVME SSD's
2x 10Gbase-T LAN on Motherboard
1x 2 port 1Gbase-T PCI card (for management)

Warm Storage:

2x 8 core Intel 3.1ghz procs
128GB ECC Registered Memory
12x 4GB HDD's in RAID 5
1x 128GB m.2 (for OS)
2x 10Gbase-T LAN on Motherboard
2x 1Gbase-T PCI card (for management)

Auxiliary Servers:

2x 8 core Intel 3.1ghz procs
128GB ECC Registered Memory
8x 250GB SSD's
1x 128GB m.2 (for OS)
2x 10Gbase-T LAN on Motherboard
2x 1Gbase-T PCI card (for management)

My intention is to purchase 4 hot storage servers, 4 warm storage servers, and 3 auxiliary servers. The hot and warm servers will make up the data nodes as well as masters. I plan to have indices on the hot storage have 2 shards with 1 replica allowing for 2 servers-worth of fault tolerance. After 60 or 90 days I plan to have indexes move to the warm hosts where they will go down to 1 shard with a replica.

The auxiliary servers I plan to use as a client node (data=false master=false) to speed up searches, a dedicated parser for logstash, filebeat, etc., and a kibana front-end.

All servers will all be on the same 1Gbase-T switch with 2 lines each configured with an 802.3ad LACP bond for data and cluster communication, and a separate 1G connection to another switch for management (ssh, scp, configuration management, etc). I have forgone dedicated master nodes because I read that it isn't really required until you get over a 10 node cluster but I'm not married to this idea if it is wrong.

Please let me know if there is anything that I have missed or if this configuration is a substandard or ill advised. I do want to know where my baby is ugly on this one because I don't want to waste the money or cycles on fixing my configuration.

Or just tell me that I did a great job and should move forward. My ego can always use a boost.

warkolm · February 18, 2021, 2:44am

A few things;

You don't have much storage there, unless that's a mistake?
Consider using docker to run multiple instances on each of your storage hosts. You can easily run 2 with ~30GB heap for eg.
Make sure you use [ILM]ILM: Manage the index lifecycle | Elasticsearch Reference [7.11] | Elastic).
Your aux nodes are a bit overkill, especially having 2 of them.

InnerJoin · February 18, 2021, 3:50am

Yea I mistyped. The storage is in 16x 1.8TB for the hot nodes and 12x 4TB HDD on RAID 5 for the warm nodes. This should give me around 73 TB for hot storage and 132 TB (I think) for warm storage ((4*12-4)4 - (412-4)). Sorry about that.

I would like to know what the benefit would be to running multiple docker images per host instead of just assigning 1/2 the memory of the host (64GB) to run one instance. That is what I have seen in documentation so far. Just specifying my thought processes and not trying to argue anything.

As far as the auxiliary nodes go, I don't really have a reference point to determine how much in the way of resources a client node is going to consume. Per my understanding I will be pointing Kibana and any log sources at the client node to either start indexing or perform searches. So if anyone has any guidelines as to how big I need to have that in relation to the data/master nodes that would be great. I have some idea in the way of a dedicated parser. I threw some netflow from a couple core routers at a really beefy VM and that beat the heck out of the CPU and memory so I would like that to stay on some pretty beefy hardware because I'm going to be throwing a lot more at it than what I gave the test/dev/qa instance.

I am pretty sure that the Kibana hardware is way more than is needed but limiting the hardware configurations made it easier to build a menu so to speak of hardware needed for expansion if ops or dev or infrastructure wanted to get in on the benefits that ES can give them. I probably won't need to upgrade that for a long time but if someone wants to start onboarding non-security stuff into the cluster I have a price that I can give them for upgrading the storage nodes depending on their log retention period, or creating dedicated master nodes, or increasing client nodes and putting them behind a load balancer. It just gives me some standards that I can hand out easily with quotes.

warkolm · February 18, 2021, 3:57am

We recommend <32GB per heap, so multiple nodes means best resource usage.

InnerJoin · February 18, 2021, 3:59am

I see. So breaking the 8 64GB hosts into 16 32GB hosts will provide better resource utilization. I'm going to look into that then.

system · March 18, 2021, 4:00am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Designing elasticsearch cluster for a SOC Elasticsearch	5	904	February 19, 2023
Tipps for a Cluster setup Elasticsearch	4	1472	July 5, 2017
Advice on Elasticsearch Architecture design Elasticsearch	4	506	April 13, 2020
Elasticsearch hardware planning Elasticsearch	5	796	July 6, 2017
3 server design recomendations Elasticsearch	6	821	July 5, 2017

Elasticsearch Topology Design

Related topics