How do you calculate the average document size in streams?

Streams is a really great feature that you have added, which was really needed. Now for the question. As the title suggests. How does the daily average get calculated?

I tried calculating the average document size by dividing the index size by the total count, but it gave me a different number. Not to mention that when I see a specific time, the average changes based on that time. Looking forward to your reply. Thanks!

Hi!

The daily average is not index size / doc count division. It's a two-part calculation.
When you divided index size by document count, you got the average bytes per document, but that's only one part of the formula. The daily average also factors in how many documents per day were ingested in the selected time window, which is why your number didn't match.

The full calculation is: first, it computes average bytes per document (total stored size / total document count). Then it looks at how many documents were ingested within your selected time range and divides that by the number of days in that range to get documents per day. The daily ingestion average is those two values multiplied together. The monthly average is the daily value multiplied by 30.

This is why the average changes when you change the time range, it recalculates the documents per day rate for whatever window you've selected. Also, for performance the query uses sampling, so the numbers are approximate (as described in the tooltip).

Here is the reference in code.

1 Like