Elasticsearch disk occupancy

Hi,

I am kind of new in the ES word, i work with ES V5 and I have some questions...
Thanks in advance to lightening me :slight_smile:

I need to know the total real occupancy of my datas through my ES cluster.
To do that, I use those commands : _cat/shard and _cat/indices

A very simple and short document (only one field, just a letter as a value...) takes 7kb while this document weighs barely 500b...
I am half surprised : in a way, I understand that this document will be stored and/or indexed and then, it cost more than the original payload. But, normally, with compression, I was expecting a little bit more efficient...

-So, first question : is it "normal" to observe that thing ?

-Is the use of _cat/shards and _cat/indices give the true size (uncompressed) of datas or the compressed size of datas ?

-If a "Double" value costs 64bits (8octets), how many cost a "Text" (String) value ?

-Will a property name in a document ("property_name":"value") will cost like a "Text" value ?

Again, thanks a lot :slight_smile:

To determine how large space your data will take up on disk, you need to index a good amount of data, ideally at least a few tens of GB. Indexing a very small number of documents does not allow compression to be very efficient, so will not allow you to draw any conclusions. You can then reduce the amount of space your data takes up by optimising the mappings you use.

Have a look at the following resources:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.