I am new to ELK Security and I am trying to implement ELK with master nodes and data nodes to handle massive amounts of data but I can not find the good documentation about my use case how to deploy and configure about that.
My daily logs ingest size is near 100gb per day and want to keep data for 30days and then I want to forward it to AWS S3 buckets for long term. I want to know how to prepare for my cluster and configuration for master and data nodes.
The best way to find the right size for a cluster is to do a proof of concept with real data to get information about the event rate, daily volume, index speed, search speed etc.
But 100 GB per day is pretty small, just for example one of the recomendations from Elastic is to keep the shard sizes of your index around 50 GB.
With 100 GB/day and 30 days retention you will need something around 3 TB of usable space, to have some kind of resilience you need at least 3 master nodes, so the smallest cluster you may have if you want to have some resilience would be a 3 node cluster.
It is better to have dedicated master nodes, so you could have a cluster with 3 dedicated master nodes and 2 data nodes to be able to have replicas for your shards.
The AWS S3 parts you mean to have a backup of your indices? You can do that creating a repository for snapshots and if you need some old data you would be able to restore it from the snapshot.
You could probably easily run this on 3 Master / Data Nodes
8CPU / 64 GB RAM, 2.5 TB SSD / or HDD per Node... (preferable SSD)
But I totally agree with @leandrojmp POCing is really important because the equations above do not take into account the Query Side / Dashboards, Alerts, ML jobs etc (still think you will probably be fine if you have a normal case... BUT test first ... deploy once or twice only ...
I got your point and thank you for your suggestions.
I would like to request one more for managing this cluster (3 master nodes and data nodes). How can I configure with only Public IP and how to receive the logs from Logstash.
In this cluster, which nodes is eligible to implement Kibana or need a dedicated instance for Kibana.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.