Elasticsearch 7.x consumes more space for indexes

Hello everyone!

I have a question about index size of 7.5.2 version compared to 6.8.6.
I have two identical set of data indexed in ES6 and in ES7. Both indexes are force merged.

ES6 (index size - 91.2mb)
green open documents a1W_4e7kSa6TotAwLtoyQQ 1 0 1128 0 91.2mb 91.2mb

ES7 (index size - 209.9)
green open documents lafZmVFjShyvxCIIDcsrzg 1 0 1128 0 209.9mb 209.9mb

ES7 is about twice large.

There is a comparison between two indexes in suitable format - https://gist.github.com/mainameiz/a94013e7375696f9f2e4f1feb5ff1b81/revisions.

Cloud please somebody explain me these differences? Is it caused by new algorithm (https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand)?

So, the mapping is the same (I have only briefly check the comparison, I did not find it intuitive to compare)? Have you executed a flush after indexing? Is any of those indices force merged?

This size seems to stem from a different configuration. What makes you think the new block max wand feature is causing this?

Thanks for reply, Alexander.

So, the mapping is the same

Yes,I think mappings are the same (same custom analyzers/filters/number of shards/etc). Both indexes have no "_all" field (it was disabled for ES6, and is not supported in ES7 at all). Both indexes are created using the same JSON payload (provided in gist)

I have only briefly check the comparison, I did not find it intuitive to compare

The last (top) diff here shows the fields on which they are different. Left (red) column is ES6 and right (green) is ES7.

I have used the command POST /documents/_forcemerge?max_num_segments=1 to merge both indexes before making a comparison.

This size seems to stem from a different configuration.

I thought that only index settings can affect index size. But as I mentioned before index settings are the same. Do you know which other settings can impact index size?

While writing this I thought that there can be some changes in default settings. I will investigate it and write the results.

What makes you think the new block max wand feature is causing this?

As well as I understand the article this new feature adds some extra data in index files (which helps to speed up the searching).
These lines from the article makes me think so:

... introduces block-max indexes and block-max WAND. The underlying idea of this paper is to split postings into fixed-size blocks and to record the maximum impact score separately for each block.
... instead of recording impact scores in the index, we record pairs of term frequency and document length.

I just repeat the experiment and everything is ok. Both indexes have similar size :slightly_frowning_face:. I'm pretty sure that I have merged both indexes and made them similar :frowning:

Sorry for wasting your time.

1 Like

No worries, glad you got it working! If you ever figure out what was the issue, please append it here to share the knowledge.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.