By-field break-down of an elasticsearch index disk usage

aalesh · February 3, 2017, 5:01pm

Hi,

We're using metricbeat to stream various system usage metrics to the Elasticsearch server. The problem is the index gets quite large (7+ GB per week). So, we are considering removing some of the metricbeat events fields. The problem is it's hard to predict exact impact of those potential changes.
Can anyone please advice on estimating how much disk space does specific field(s) use in a particular index?

Thanks,
-Andrey

polyfractal · February 3, 2017, 6:34pm

It's really, really hard to estimate, unfortunately.

Lucene uses a number of tricks to compress fields, and these compression tricks depend in large part on what kind of data is being indexed. E.g. high cardinality fields take up more space than low cardinality fields, because low-cardinality fields compress better. Numerics are smaller than strings, scaled-floats are smaller than floats which are smaller than doubles, etc.

And then it gets more complicated because different compression strategies are used depending on the data in each segment, which can change as segments are merged (i.e. two medium-ish cardinality segments may merge into one segment and form a high cardinality set, changing the compression scheme. Or two segments may merge and vastly reduce their on-disk footprint due to mutual compression).

If you wanted to experiment, you could use the Reindex API to index a single field from your existing data over to an isolated, test index. Because that index only holds a single field, you'll have a very good estimate of the field size. Rinse, repeat for various fields. We have an internal tool to estimate field sizes... and it basically does exactly that.

That was all pretty vague, unfortunately. Sorry

polyfractal · February 3, 2017, 6:39pm

These may be somewhat useful too:

aalesh · February 3, 2017, 8:06pm

Thanks for your prompt and comprehensive response!

Sounds like a plan

BTW is there any chance the tool can be shared?

No worries, you by no doubt did your best.

polyfractal · February 3, 2017, 8:28pm

Lemme check and see the status of that tool. I'm not sure it's been updated in some time... it may not be working with newer versions of ES.

system · March 3, 2017, 8:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Disk space utilisation of an Elasticsearch Index Elasticsearch	4	354	July 1, 2020
Stats on disk space used by each field in an index? Elasticsearch	1	436	July 5, 2018
Calculating Disk Space being used Elasticsearch	4	1209	September 1, 2020
_source filed is using to much disk space Beats metricbeat	3	305	August 29, 2022
Is there a way to know the space (disk/memory) used per field in an index? Elasticsearch	1	299	July 6, 2017

By-field break-down of an elasticsearch index disk usage

Related topics