ELK Hardware Guidelines

We are working on a ELK Stack for Log Avalysis and pull reports.
Daily 20 GB(Raw Data) of logs from different source needs to be stored in Elastic search. With 3 years retention period the total log size would be 20 - 25 TB (Raw Data).
Following are the details which requires your team inputs

  1. What would be ideal cluster configuration (Number of node, CPU, RAM, Disk size for each node, etc) for storing the above mentioned volume of data in ElasticSearch?
  2. Is there any tunning required in Kibana to search and visulize the above mentioned volume of data?
  3. What's the Elasticsearch max retention period used in the industry now?
  4. Is there any way to compress the data before storing into ElasticSearch?
  5. Is there any migration tool available to load the data from exisitng system to Elasticsearch?
  6. Any back up system (Redis, HDFS and etc) need to be setup before loading the data into Elasticsearch?

This depends a lot on your data and requirements. In order to get a better understanding, look at the following resources:

This also depends a lot on your requirements. The question is far too generic to be able to give a good answer.

There is no defined max retention period. I see users keeping data for a few weeks all the way to multiple years depending on how critical it is. Some users with long retention periods have a certain amount of data available online for immediate querying while storing snapshots of older data as backups that can be restored temporarily for analysis when needed.

Elasticsearch automatically compresses the data it indexes and there is now a more efficient 'best_compression' codec that can help save disk space.

This depends entirely on what the source system is.


Only one minor addition:

This depends on your resiliency requirements. Check out Elasticsearch Resiliency Status | Elastic for more information.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.