Hi people, we have an ELK stack for a SIEM solution in our company.
We need to ingest data from 10 devices aproximately, and we need to maintan the data for 3 months. We have one firewall which is a heavy log sender.
Can you tell me how much storage do we need?
Or maybe you can tell me how much storage you use in your implementations, in order to decide to use our corporate storage or to buy a dedicated one.
Thank you very much.
How much raw data your devices will generate on a daily basis ? 2To/day for example ?
Hi, aproximately 100 GB per day, but in this situation I will need:
30 x 100 GB = 3 TB per month
and 9 TB for a 3-month-period
So we need to buy a new storage a think.
What is yoor situation? Do you work with raw data or do you try to get only relevent fields using a Logstash pipeline or something like this, in order to reduce the data stored?
You can opt for a Hot/Warm/Cold architecture
Let's say following config (This is including 1 replicas) :
- 7 days hot data : 2 nodes with 2Tb SSD Storage
- 30 days warm data : 2 nodes with 8Tb SAS Storage
- 60 days cold data : 2 nodes with 16Tb HDD Storage (you can have 0 replica here)
You can refer to this presentation on how to size your cluster
Raw data converted to JSON will double the volume => 200 Gb/day of json
If you clean your data with logstash and you customize your mapping then indexing process will reduce the json size by a factor of 0.5 may be ... then you can say that you will index 100Gb/day of data, considering replicas will double the disk size
Dear, your explanation is fantastic, I have to investigate this way.
Thanks a lot again and maybe I will ask you again if I have any new doubt.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.