I am currently doing some research and designing the architecture for the ELK stack to be used in production. My team has purchased 9 elastic nodes for this setup.
However, I have some questions on the sizing of the elasticsearch nodes.
I expect 350 systems to be sending logs into elasticsearch and it is estimated that one system would send 50 MB logs per day.
We are also storing the logs for 180 days only before deleting them.
Index would rollover daily or logs exceeds 30GB
1 day --> 1 system --> 2 shards --> 100MB
180 days --> 1 system --> 360 shards --> 18GB
180 days --> 350 system --> 126000 shards --> 6.3TB
Furthermore, I planned to have
Primary Shards: 1
Replica Shards: 1
and planned to give my VMs these specs:
3 x Master node:
CPU: 4
RAM: 16GB
DISK: 50GB
6 x Data node:
CPU: 8
RAM: 32GB
DISK: 2TB
I am unsure if these settings are ideal for the environment now, hence requesting advice.
If you have low data volumes try using monthly rather than daily indices. Shards should ideally be over 10GB in size. Also try storing data from multiple systems in the same indices to reduce the total number of shards to hundreds rather than thousands.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.