Disk usage on benchmarks

Hi all,

I'm studying the Elasticsearch Adhoc Benchmarks and am curious about the disk usage metrics. Why there is a huge gap between final index size (25GB) and total bytes written(313GB)?

Lucene uses immutable segments to store data and these are created reasonably small and then merged into larger ones. This means that the same data is written to disk multiple times as more data is added to the index and merging creates larger and larger segments.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.