By-field break-down of an elasticsearch index disk usage


#1

Hi,

We're using metricbeat to stream various system usage metrics to the Elasticsearch server. The problem is the index gets quite large (7+ GB per week). So, we are considering removing some of the metricbeat events fields. The problem is it's hard to predict exact impact of those potential changes.
Can anyone please advice on estimating how much disk space does specific field(s) use in a particular index?

Thanks,
-Andrey


(Zachary Tong) #2

It's really, really hard to estimate, unfortunately. :disappointed:

Lucene uses a number of tricks to compress fields, and these compression tricks depend in large part on what kind of data is being indexed. E.g. high cardinality fields take up more space than low cardinality fields, because low-cardinality fields compress better. Numerics are smaller than strings, scaled-floats are smaller than floats which are smaller than doubles, etc.

And then it gets more complicated because different compression strategies are used depending on the data in each segment, which can change as segments are merged (i.e. two medium-ish cardinality segments may merge into one segment and form a high cardinality set, changing the compression scheme. Or two segments may merge and vastly reduce their on-disk footprint due to mutual compression).

If you wanted to experiment, you could use the Reindex API to index a single field from your existing data over to an isolated, test index. Because that index only holds a single field, you'll have a very good estimate of the field size. Rinse, repeat for various fields. We have an internal tool to estimate field sizes... and it basically does exactly that.

That was all pretty vague, unfortunately. Sorry :disappointed:


(Zachary Tong) #3

These may be somewhat useful too:


#4

Thanks for your prompt and comprehensive response!

Sounds like a plan :slight_smile:

BTW is there any chance the tool can be shared?

No worries, you by no doubt did your best.


(Zachary Tong) #5

Lemme check and see the status of that tool. I'm not sure it's been updated in some time... it may not be working with newer versions of ES.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.