Compression ratio

Is there any way of getting the current compression ratio of an index?

1 Like

It's not clear which compression ratio you are interested in. Elasticsearch exposes various metrics, such as the size of an index on disk, in the indices stats API, so you can calculate a compression ratio by dividing this metric by the total size of the documents that have been indexed. Is that what you're looking for?

I agree. However, as per another thread, Elasticsearch does not track incoming data volume. So, please suggest how to get the total size of documents that have been indexed.

Right, yes, as far as I know Elasticsearch doesn't track the total size of documents indexed. It's a bit of a tricky thing to measure accurately: the distributed nature of the system means that in general different shards will have indexed different sets of documents at any given point in time, and the machinery to track this statistic correctly across shard failures and primary relocations would be quite complicated. Perhaps there's a way to get an approximate answer with an aggregation? Or you could track this statistic externally?

Wondering if that would help: https://www.elastic.co/guide/en/elasticsearch/plugins/current/mapper-size.html

TIL :heart:

Thanks for pointing us in the right direction. This solves the purpose well.

For getting the total ingested volume, I am using the following query:

GET my_index-2019.01.20/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "sizes": {
      "sum": {
        "field": "_size"
      }
    }
  }
}

The only problem is for large indices (65gb primary volume), the request times out.

If your indexed data is reasonably uniform in size maybe you could calculate the average document size based on a sample set and use this to estimate the compression ratio?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.