Designing elasticsearch cluster for a SOC

Hi everyone,
I'm Tom, i'm a soc analyst since 4 years and i'm actually working for a company who want to put a SIEM. I've read many blog post on how to designing/sizing an elasticsearch cluster but i want to have your feedback.
We want to ingest 1.5TB of logs per day and want to keep them at least 3 months to be accessible with Kibana.

With your experiences and knowledge, how many node i need to support this amount of data et have nice performances. From my previous jobs on companies working with ELK, i think it could be nice to have 3 master nodes et at least 10 data nodes. (RAM : 8 or 16GB)

Thanks for your help.

With 1.5 TB of logs per day, in 30 days you will have something close to 45 TB without replicas, if you use at least one replica you will need 90 TB per month.

To store this data for 3 months, and still using replicas, you will need something close to 280 TB of disk for your data nodes.

You would need then 3 dedicated master nodes and enough data nodes to store aproximately 300 TB of data, the number of data nodes will depend on how you will run your cluster, if you will run it on premises or self-managed on a cloud or if you will run it using elastic cloud.

One thing that you should also do is to try to have a hot-warm deployment where your hot nodes have smaller but faster disks to keep the more recent data and your warm nodes have larger but slower disks, you will also need more powerful nodes, something with 32 GB or 64 GB.

As an example, I manage a elastic cluster that is used as a SIEM, at the moment I have 3 master nodes, 4 hot data nodes and 9 warm data nodes.

MY hot data nodes have 64 GB of RAM and 4 TB of Premium SSD disk, the warm nodes have 32GB and 4 TB of Standard SSD disk, I use ILM policies to move the indices from hot to warm, some indices stay on month on hot, others just a couple of days, depending on the data and size, I'm also only using replicas when the data is on the hot nodes.

Currently I have something close to 52 TB of disk and 40 TB of usage.

1 Like

Hello Leandro,
Thanks for your answer.
The cluster will be on premise.
I will use indeed an architecture Hot/Warm.
I think we can go on 64GB of RAM on the hot nodes and 32 on warm.

Let's imagine my hot nodes will keep log for 20 days and warm for 70 (90 days for 3 month of retention).
In stockage I will need :

  • Hot nodes : 1.5TB x 20 : 30TB of disks
  • Warm nodes : 1.5TB x 70 : 105TB of disks

So :

  • 6 hot nodes with 5TB of disk for each (64GB of RAM)
  • 12 warm nodes with 10TB of disks (I take a little more of TB when in doubt) (32GB of RAM)

All of this without replicas. How does work replicas with an architecture hot/warm ? Do I have to double all my nodes ?

Thanks again for your answer and explication.

The reference manual offers guidance on designing a highly available cluster here:

Amongst other concerns, for resilience you should have (at least) one replica for everything. If you don't want to do that (e.g. because it doubles your costs) you should use searchable snapshots instead:

While you data is in the hot phase, the primary and replica shards of your indices will be allocated on hot nodes, the nodes that have the data_hot role, when those indices move to the warm phase, the shards, both primary and replica will move to nodes with the data_warm role.

If you want to have replicas while your indices are in the hot phase and also in the warm phase, so yes, you will need to double your space, and this applies to having just one replica, if you want to have two replicas you will need even more space.

If you need replicas or not is entirely on you, on my case we decided to have replicas only for the hot phase, whe the index enters the warm phase it will not have any replicas, but we are aware of the consequences, for example, if one of the warm nodes is offline, the indices allocated to that node will not be available and the cluster status will be red until the node comes back online.

As mentioned, you also have the option to use Searchable Snapshots, but this feature is only available on the Enterprise license, if you were planning to use the Basic Free license, adding more nodes may cost less than an Enterprise license, but you will need to check this.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.