Elastic Stack Architecture and Requirements

Analyst · June 23, 2022, 1:55am

Hi Guys,

I need to create an ELK architecture but I don't know how many servers and requirements (CPU, RAM,Disk space) I will need for large organization deployment.
I will need to send syslog and logfiles from 7000 servers + network devices with Firewall being the largest on disk (total around 2.5 TB/day) to this ELK and have almost 30 days retention for live searching and up to a year for offline search.
I also will need to separate in 3 tiers (web - kibana, app - logstash, db - elasticsearch) and I also want high availability.
So if the operating system is Red Hat Enterprise Linux 7 what should be the correct architecture?

Thanks,
John

grfneto · June 23, 2022, 3:04pm

Hi @Analyst

Basically for hot data elasticsearch is capable of processing 2.8TB for a 64GB RAM node with 10 vCPU.

making basic calculations, only with elasticsearch nodes you will need 10 with the configuration mentioned above, since you need to retain the data for 30 days.

but it all depends on the usage scenario. You can get an idea of architecture on the elastic website, where it shows the number of nodes in elasticsearch, kibana, ML, APM Server, etc. Even with high availability.

Elasticsearch Service pricing calculator — Elastic Cloud

hope it helped, greetings

leandrojmp · June 23, 2022, 3:34pm

It is not an easy task to size an Elasticsearch cluster, there are many variables to it.

But, to start and to have high availability, you will need at least 3 nodes that can act as master nodes and data nodes, since you have a lot of servers and devices, it would be best to have those roles split in different nodes, so you could use 3 node that will be master-only and some data nodes, the number and specs of the data nodes will depend on the data.

It is pretty common to have a hot-warm architecture where your hot nodes will have more recent data and faster hardware, specially fast disks, and your warm nodes will have older data and can have not so fast disks.

To have high availability in your data nodes you will need at least 2 data nodes, this enables you to have replicas.

The hard part is to define how many nodes you will need and what are the specs of the node.

There are some recomendations from elastic that can help, like for example:

Keep de number of shards lower than 20 per GB of HEAP, so, for a node with 30 GB of HEAP, you would need to have a maximum of 600 shards.
Keep the size of shards around some 10s of GB, normally you could aim to have your shard size around 50 GB, which is the value elastic uses by default in the ILM policies.

If you estimated 2.5 TB/day, to have high availability you need replicas, so if you have just 1 replica for each one of your indices, you will have 5 TB of data, for a retention of 30 days you will need more than 150 TB of size.

I'm not sure what you mean with offline search, but for 1 year of retention you are talking of almost 2 PB of data.

One good approach is to do a proof-of-concept with your data to see how big your index will be, what will be the indexing event rate to see how many logstash instances you will need, check if you need to index every field, find the correct mapping for every field, this will help you to have some idea on what you will need to start and then you can scale as needed, which will also depend if you will run it on-premises or on cloud, managed or self-managed.

system · July 21, 2022, 3:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ELK Architecture and requirements Elasticsearch	6	86670	October 23, 2017
Hardware requirements for elasticssearch Elasticsearch	4	4045	May 9, 2024
Hardware requirement ELK Elasticsearch	4	6972	October 23, 2019
Architecture Design Elasticsearch	2	1004	February 15, 2019
ELK Stack - Hardware set-up Elasticsearch	5	376	June 26, 2020

Elastic Stack Architecture and Requirements

Related topics