"Estimating Resources for an Elasticsearch Cluster with 10,000 EPS Log Ingestion"

I have a system that includes Elasticsearch, Logstash, and Kibana, with the following architecture:

Logs are collected using Beats (Filebeat, Packetbeat, etc.) and sent to Elasticsearch via Logstash. I want to run Elasticsearch in a cluster setup, but I am facing difficulties in calculating the required resources and determining the number of nodes needed.

Here are my log ingestion details:

  • Maximum log ingestion rate: 10,000 EPS
  • Retention period in hot/warm tiers: 1 month
  • Retention period in cold tier: 6 months

Could you please help me calculate the required resources for this log volume, especially the storage requirements?

The sizing will depend a log on the type, size and complexity of logs being ingested as well as how you are querying the data and what latencies you expect.

I would recommend you have a look at this old blog post. It discusses Elastic Cloud but the concepts can be applied also to on-premise clusters. It is a bit old and there have been a lot of improvements since it was written, but I do believe the concepts are still valid and it is therefore a good starting point for understanding how to size a cluster.

1 Like

The easiest way to find this is by doing a proof of concept.

Start a test cluster, start ingesting data and after a couple of days you will have more information about the event rate, the event size, the storage needed etc.

For example, how you arrive ath the maximum log ingestion of 10k e/s? Where did this number come from?

Depending on what you are going to collect a single machine could generate more than 10k e/s.

Also, a cold tier does not make sense if you already have a warm time, it is common to have a hot tier and warm tier, where the hot tier you have faster machines with faster disks (ssd or nvme) and in the warm tier you have slower machines and slower disks (hdd).

But what would be the difference from the warm to the cold tier in term of hardware? So, in this case just a hot-warm architecture would be enough.

If you have a paid enterprise license you could have a hot-frozen architecture, where the frozen tier uses searchable snapshots with the data stored in Object Storages like S3 (normally MinIO when running on premises).

Another important thing is, what is the average size of documents? Without this you cannot calculate how much storage you need in each tier.

So, as mentioned, the best way to find the correct number is by doing a proof of concept with a test cluster.

2 Likes

Hi Christian_Dahlqvist,

Thank you for your response! I really appreciate your insights.
I fully understand that a precise resource estimation requires considering multiple parameters. However, at this stage, my goal is to perform an approximate calculation and provide an initial design. Later, during implementation, I will refine the numbers based on real-world data.

Regarding the log sources, I receive logs from both network components (e.g., Zeek, Firewalls, etc.) and Windows/Linux systems. Additionally, my expected query performance requires minimal latency.

I have thoroughly read the article you shared, and I found it very useful and practical. I also reviewed another related article:
:link: Benchmarking and Sizing Your Elasticsearch Cluster for Logs and Metrics

Using the formulas from these references, I performed some initial estimations and arrived at the following numbers:

Estimated Storage Calculation:

If I assume an average log size of 500 bytes, then:

  • Daily log ingestion volume: 10,000EPS×500Byte×60s×60m×24h=432,000,000,000B ~= 403GB

Hot Tier (30 days retention, 1 replica)

  • Total Data (GB): 403GB×30days×2replicas=24,180 GB
  • Total Storage (including overhead 15% for indexing + 10% for OS/filesystem): 24,180×(1+0.15+0.1)=29,016 GB

Warm Tier (180 days retention, no replica)

  • Total Data (GB): 403×180=72,540 GB
  • Total Storage (including overhead): 72,540×(1+0.15+0.1)=87,048 GB

Question:

Based on your experience, do these numbers seem reasonable? Would you suggest any adjustments?

Hi leandrojmp,

Thank you so much for your detailed response and insights! I really appreciate your time and expertise.

Actually, to estimate the required disk space, I set up a small-scale test environment with limited resources and started ingesting logs using flog at a rate of 5KB per second. During the test, I observed that the Elasticsearch index size was approximately 1.1 times larger than the original log file size. In other words, the indexed data grew by 10% beyond the raw log size.

Regarding the 10K EPS estimation, this number was suggested by the host, although their actual log rate is around 6K EPS. The higher estimate was used as a buffer to prevent potential system overhead.

I really appreciate your point about architecture. The host requested logs to be searchable and analyzable for 2 months and stored for an additional 6 months. Initially, I considered a cold tier to leverage HDDs for cost efficiency. However, based on your feedback, I now realize that it’s better to stick with a hot-warm architecture, as cold wouldn’t provide significant benefits in this case.

I would love to hear your thoughts on the calculated numbers to ensure they are realistic and aligned with best practices.

Thanks again for your help! :blush:

What is flog ? What did you use to send the logs to Elasticsearch? Did you create a template with a mapping before sending the data?

If you do not create a template with the mapping, Elasticsearch will create the mappings for the fields on the first time it receive data, and this can lead to extra space usage as string fields for example will be stored both as keyword and text data types.

Also, the compression of the index can be improved by changing to best_compression, which is the default only for Elastic Agent integrations.

So, with some mappings and settings adjustments you may be able to save some space.

In those 6 months you may still search the data or it needs to be stored just in case?

If you do not need to search after 2 months, maybe using snapshots could be cheaper as you would pay only for the storage (considering snapshots on object storage on cloud services).

1 Like

Hi leandrojmp,

Thanks again for your response!

  1. flog is a log generator that I used to simulate real-time log ingestion.
  2. I did not use Logstash—instead, I used Elastic Agent with a Custom Log Integration. I simply pointed it to the log file path to collect and ingest the logs into Elasticsearch for testing purposes.
  3. Initially, I did not create a custom template with explicit mappings before sending the data. So, as you mentioned, Elasticsearch automatically generated mappings, which might have led to extra space usage.

Your point about mapping optimization makes a lot of sense. Since text fields are stored as both keyword and text, it could be causing unnecessary storage overhead. I will definitely define a custom index template with optimized mappings to reduce storage consumption.

Also, I wasn’t aware that best_compression is not enabled by default outside of Elastic Agent integrations. I’ll update the settings to apply best_compression and check how much storage I can save.

Thanks for these great suggestions! Do you have any other best practices for optimizing index size?

I would recommend you have a look at the official documentation on the topic.

1 Like