Sizing Elastic

I am trying to define the right size for a project and need some advice.

  1. I checked the size of event after processing and mapping, inside Elasticsearch and the Average was 25KB.
  2. We have 35 different logs and we are planning to index them per type and day.
  3. Average 250000 events per log per day.
  4. Retention days: 45.

Total number of events per day * Average size of event * Retention days = 9.17 TB.

My question is, how many Data nodes should I use and what is best Storage per each?
Do I need to multiply the storage for the replica shards?

Each VM has 64 GRAM, and I am aware of the 1:24 ration between the vm GRAM to the storage.

How many Master / Coordinators do we need for this amount of Data nodes?


1 Like

That is quite large. Did you get this figure based on a good amount of data so you get the full benefits from compression etc?

This could potentially lead to a lot of small shards. Make sure you reduce the number of primary shards per index if you do this. Given 45 days retention period you may also want to consider weekly indices as a lot of small shards can be inefficient.

Yes, replica shards need to be accounted for.

For logging use cases we see varying disk-to-RAM ratios, often a lot higher than 1:24. The ideal ratio for your use case will depend on data, work load and types of disks, so I would recommend running a benchmark.

As clusters grow, having 3 dedicated master nodes is best practice. I see a lot of logging use cases do without dedicated coordinating nodes, but you may want to have one for each Kibana instance.


Will analyze your answer now.

  1. I created few input files with origin events from production, working with best compression and created groks in the logstash, including only the important fields we need and want to have in the output. More that I defined a mapping per log in the elasticsearch.

The results look like that:

Log Name Index Size (KB) Orig Size (KB) Number of events Size per event in Elastic (KB) Size Original event (KB) Ratio
Uxfserver.log (modestouxfsimple) 28.5 1.4 3 9.5 0.47 20.36
Uxfserver.log (modestouxf) 37 5.8 3 12.33 1.93 6.38
BAPServer.log 20.3 2.5 10 2.03 0.25 8.12
CRM_IGW_Server.log 22.8 2.6 5 4.56 0.52 8.77
CRM_CSR_Server.log 20.6 3.5 2 10.30 1.75 5.89

  1. If I create weekly indices, won't they be to big? I currently defined number of shards to 1 in my mapping templates. Will 3 shards be good?

  2. If 9.17 TB per storage, do I need to get 30 TB per holding two replicas?


If weekly indices, better to have 5 Shards. Right?

To accurately draw conclusion about how much space data will take up on disk, you should index a reasonably large amount of data. I think we recommend around a GB or so in this video about sizing, but a few hundred MB should be sufficient.

As per the link I provided earlier, aim for a shard size between 10GB and 50GB. If daily indices are too small and weekly indices are too large, increase the number of primary shards. You can also use the rollover index API to switch to a new index based on data volume and/or time (whichever is hit first).

As it looks like you have based the size estimate on a small data set, I would revisit this before determining how much disk space and node count you will need.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.