Calculating Disk Space being used

Hello All

I am working on a potential upgrade for our Elk stack and need to find a few things. One is how much space is being taken up just by Elk indexing new logs as they come in. For example if I am taking in ~10gb of uncompressed logs each day what would that size inflate to once everything is indexed on Elks side?

The other is a method is seeing what logs are taking up the most space on the Elk side. Is there a way to look at the size of specific types of logs either in an index or on the Elk stack as a whole?

Thanks

That is very hard to say. How many replicas do you have? How much enrichment do you do? Changing the index_options can significantly change the amount of space used to index a document. I suggest you index a few GBs or tens of GBs of documents and see how much the index grows.

You might get a better answer in the elasticsearch forum rather than the logstash forum (you can move this thread, no need to start a new one).

I am currently using one replica, as for enrichment where would I go to find out? I ran

/usr/share/elasticsearch/bin/elasticsearch-plugin list

and it gave no results so I would have to assume no plugins means no enrichment right?

Enrichment comes in many forms. For example, a geoip filter in logstash can convert a simple 10-15 byte string into dozens of fields. A useragent can also add a dozen or more fields. A jdbc or http filter could be adding arbitrarily large amounts of data to each log entry. Even a simple translate could add volume.

In filebeat there are half a dozen or so processors that can add metadata about the host, container, etc.

A 20 byte log file entry that goes through filebeat and logstash could add one field of 20 bytes to elasticsearch, or a hundred fields taking up a couple of kilobytes.

Similarly, in elasticsearch itself, leaving the index_option 'positions' enabled results in a significant amount of metadata being added to the document. Now 'positions' may be absolutely essential to your use case, but I have in the past had use cases where it could be disabled.

As I said, I would suggest you run a test to get a feel for how much space in elasticsearch a given volume of logs uses.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.