Elk setup for 6 months logs storage

kriti_dabas · January 5, 2024, 5:27am

What should be my setup for elk if I want to keep the logs for 6 months?
My flow is syslog-ng -------kafka --------logstash---------elasticsearch --------kibana .
My per day data is 120GB.
I want to know the number of nodes I should use in elasticsearch/kibana/logstash and space/cpu/ram for those nodes?

cperzrt10 · January 8, 2024, 9:53am

There is not a default configuration of number of nodes.
To know your stimated space you must know how you want your data. If you want to have Hot-Warm-Cold archiqueture or only Hot.

Elastic recommend 1GB of RAM per 30 GB of data in Hot-Warn State, in Cold state recomend 1GB of RAM every 100GB of Data.

May i sugest 1 node to Logstash, 1 Master node, (HOT) 2 nodes Master-Data + 1 node Data, (Warm) 3 data nodes, (Cold) 2 nodes. If you only want to use Hot state it could be 1 Master node + 2 Master-Data nodes + 3 Data nodes. It could be good architecture.

The number of 3 data nodes in Hot and Warm is for the high availability of the data

Here some links of interest

Hot warm cold configuration
Best practices

kriti_dabas · January 8, 2024, 11:30am

current set up of elk for 3 month logs retention using hot phase of ilm policy
1node kibana
1node logstash
3nodes kafka and syslog-ng
3 master node elasticsearch
2 coordination node elasticsearch
6 data nodes elasticsearch
Now i want to keep logs for 6 months for 120 gb data per day with 6/7 data source as in indexes i have

cperzrt10 · January 8, 2024, 2:48pm

You can set 2 of the 3 master nodes as master-data too, to improve the ingest, retencion and access to all data.

To the data machines how mani GB of RAM and Disc do you have? do you have configured de jvm.options to change the size of ram to the JVM

stephenb · January 8, 2024, 3:58pm

Actually this article from @Christian_Dahlqvist is still one of the best

This is for Elastic Cloud but the concepts and calculations are valid.

There are specific examples of calculations...

Also if you are building an on-prem cluster with Basic you can use the same HOT / Warm Profiles

Only difference we now recommend 160:1 for RAM / DISK on Warm nodes so
64GB RAM = ~10TB Disk

kriti_dabas · January 9, 2024, 5:25am

node	space(df-h)	version	cpu	RAM(Mebibytes/Gibibytes)
kibana/Nginx	100g	7.16.2	8	62GI
syslog-ng/kafka	100g	2.11-2.2	4	15GI
syslog-ng/kafka	100g	2.11-2.2	4	15GI
syslog-ng/kafka	100g	2.11-2.2	4	15GI
logstash	100g	7.16.2	8	15GI
em1	50g	7.16.2	4	15GI
em2	50g	7.16.2	4	15GI
em3	50g	7.16.2	8	62GI
ec1	50g	7.16.2	4	15GI
ec2	50g	7.16.2	4	15GI
ed4	1000g	7.16.2	8	62GI
ed5	1000g	7.16.2	8	62GI
ed6	1000g	7.16.2	8	62GI
ed1	1000g	7.16.2	8	62GI
ed2	1000g	7.16.2	8	62GI
ed3	1000g	7.16.2	8	62GI

kriti_dabas · January 9, 2024, 5:31am

okay i will look into it. thankyou
how would this go for 120gb data per day retention for 6 months using hot phase?
total nodes = 16
data nodes = 7
coordination nodes = 2
master nodes = 3
logstash = 2
kibana = 2 one on standby
6 months retention of logs 120 gb per day * 180 days = 21600gb in 6 months
3 master nodes = 100gb each
2 coordination nodes = 100gb each
7 data nodes = 3TB each
3 kafka/syslog-ng = 3TB each 1 week retention policy
2 logstash - 1 tb each
2 kibana - 100 gb each

kriti_dabas · January 9, 2024, 5:56am

defined for all 6 data nodes in my current setup
node.roles: ["data_hot","data_content"]

cperzrt10 · January 9, 2024, 7:29am

You can increase the size of the data nodes a little, maybe 500 GB each to have room, the rest I think is fine.

If you see that Kibana, logstash or any of the other servises is slow you can increase RAM or CPU depends if is one or other saturated.

Christian_Dahlqvist · January 9, 2024, 8:20am

Given that you are setting up 2 or 3 of everything it looks like you are looking for some level of redundancy and high availability. If this is the case you probably want to have a replica shard for each primary shard, which will double the size of indices on disk. I would therefore double the amount of disk on each data node.

kriti_dabas · January 9, 2024, 9:46am

this 120 gb is primary + replica i have collected it from index management. i was planning this
total nodes = 16
data nodes = 7
coordination nodes = 2
master nodes = 3
logstash = 2
kibana = 2 one on standby
6 months retention of logs 120 gb per day * 180 days = 21600gb in 6 months
3 master nodes = 100gb each
2 coordination nodes = 100gb each
7 data nodes = 3TB each
3 kafka/syslog-ng = 3TB each 1 week retention policy
2 logstash - 1 tb each
2 kibana - 100 gb each
i just wanted to know the ram and cpu

cperzrt10 · January 9, 2024, 9:55am

As @Christian_Dahlqvist said if you want to have primary + replica you must double the space that you have to maintain the 160GB per day plus the replica that sums 320GB per day.

To calculate the ram use this formula 1GB of ram to 160 GB of data

The CPU at first glance depends on how many people connect simultaneously and all the processes you use, transforms, ilm, machine learning, etc. To determine this aspect you can see if the current CPU configuration works well or if you do the "top" or "htop" command (centos-linux) if you see that the load is very high you should expand the CPU.

kriti_dabas · January 9, 2024, 10:00am

but the 120gb is 60gb + 60 gb primary + replica.

Christian_Dahlqvist · January 9, 2024, 10:04am

Good to hear you have already accounted for that as it can make a major difference if missed.

Dedicated master nodes should not serve requests, so do usually not need a lot of CPU and RAM. 2CPU cores and 4GB RAM (2GB heap) might be a good starting point.

For data nodes the size will depend on the load they will be under. I would recommend a ratio of 1CPU core per 8GB of RAM as a good starting point. 32GB to 48GB RAM per data node might be a good starting point based on the disk size you specified.

Dedicated coordinating nodes are more difficult to size as it depends on how much query load they serve and whether they do handle ingest pipeline processing as well.

I do not have any recommendations around Logstash and Kibana nodes.

kriti_dabas · January 9, 2024, 10:19am

OKAY thankyou

kriti_dabas · January 9, 2024, 10:20am

space is also good?
3 master nodes = 100gb each
2 coordination nodes = 100gb each
7 data nodes = 3TB each
3 kafka/syslog-ng = 3TB each 1 week retention policy
2 logstash - 1 tb each
2 kibana - 100 gb each
I didn't get this - To calculate the ram use this formula 1GB of ram to 120GB of data?

Christian_Dahlqvist · January 9, 2024, 11:10am

How much data a node can handle will depend on the data, how you optimise indices and the load on the cluster. 120GB of data on disk per 1GB of RAM sounds a bit aggressive for a node that holds a lot of data and also handles indexing, so I was a bit more conservative.

system · February 6, 2024, 11:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Setting up Multi-node Architecture of ELK for log monitoring Elasticsearch	6	685	June 10, 2019
Designing elasticsearch cluster for a SOC Elasticsearch	5	875	February 19, 2023
ELK architecture Elasticsearch	5	206	June 25, 2024
Elasticsearch cluster design Elasticsearch	2	929	August 28, 2018
Elasticsearch Sizing in Petabyes Elasticsearch	2	667	February 26, 2018

Elk setup for 6 months logs storage

Related topics