I want to ship my application log files into Elasticsearch and trying to come-up with a capacity plan for Elasticsearch.
Let's say if I have 1 GB log file, how much disk space I need for Elasticsearch to store log contents?
Is it possible to compress data in Elasticsearch to reduce the size?
Here is a nice section for tuning for disk usage.
How much space your exact logs will take you will need to configure and test.
Thanks for the reply but I need to understand the logic to calculate the capacity for Elasticsearch based on the size of the log file
It depends on your log types... yes it does and that is why you need to test.
But the very ...very... very simplest ..... IF ... and that is a BIG IF you logs are normal text log lines that are 500 bytes or so of text and fields etc...
With all the mapping, ingest, fields, conversion, keeping
_source, compression etc ... turns our a very simple rule of thumb it turns out a starting point is about 1:1 of raw logs to disk space ... with the replica for HA that will be 1:2 raw logs to disk space.
Then you need to leave some disk overhead we usually say only use 85% or so of disk.
So the very simple basic equation is.... there are many variables that can affect that.
Total Disk Space Needed = (GB / Day of Raw Logs * 2 ( 1 primary + 1 replica) * number of Days)/(.85 disk space watermark)
Example 10 GB / Day with 1 Replica for 30 days
Total Disk Space = (10 GB / Day * 2 * 30 days) (0.85 Disk Space) = 705GB Total Disk Space
And in case I did not say it ... you need to test
Edited a bit changed to "a starting point" ... see David's Below...
Here's a blog post with some more detail on how variable this ratio can be, as a function of the content of the data and how it's being indexed:
TLDR it could be ~50%, it could be ~130%, it depends on a lot of things.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.