Document size and compression algorithm for document


(Cong Wang) #1

Hello,

I would like to find out the estimate size of each document in my index. I use "GET /_nodes/stats" to list the "merges" section which has the following 2 numbers I am interested in.

"total_docs": 3277526192,
"total_size_in_bytes": 12506132649263,

The I calculate size of each doc roughly using this formular: total_size_in_bytes / total_docs

But the number is too small to me.

I searched around and found this article about LZ4 compression used for documents.

My questions are

  1. Is LZ4 document compression algorithm used by default for each document ? I am using 5.2.1.
  2. If so, what is exactly ratio number ? I goolged around and I got a rough number which is 64 on the best.

(Mark Walkom) #2
  1. LZ4 is the default, we can also do deflate.
  2. The ratio of what exactly?

(Cong Wang) #3

Thanks for your reply.

  1. So Can I think like this way ? all documents are compressed into smaller size using LZ4 by default. So the size of bytes of all documents I cat from index is smaller than the actual data transferred over the wire into ES.

  2. ratio of compression. original document size is 100bytes, it became to 10bytes after compression. correct ?


(Mark Walkom) #4
  1. Yes.
  2. There is no set ratio, it's dependant on the document in question, other documents in the segment and even the sparsity of the document field values.

(Cong Wang) #5
  1. so does this mean that I can not know the actual document size based on the size number I get from Index ? I thought the compression ratio of LZ4 is nearly 64%.

(Cong Wang) #6

thanks.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.