Index with additional field taking less space than the same index without it

Hi,

We have been using a field in a similar way to what the old _all field used to do. We have a bunch of copy_to from most of our other fields to copy to that field.

I have been trying to measure the impact on disk space of removing that field, the idea being that we'll use simple_query_string on all the original field instead of just querying the catch-all one. Our index contains 3 billions large documents and space is an issue.

I generated 2 indices containing identical data with the copy_to and without it, the rest of the mapping and configuration is identical. I expected the version of the index without the catch-all field to take less space on disk but amazingly the store.size_in_bytes stats reported by Kibana indicate that the index with the catch all field takes 33% less space.

This is completely unexpected and I can't find an explanation for it. How could not having an additional large field result in a bigger index size?

I can't check the stats from Kibana against the actual size on disk as it is spread over many many shards.

Any idea?

@jnioche
Have you forcemeged before comparing sizes?

I am assuming both indices are test indices. If you haven't, then force merged by running

POST /index1,index2/_forcemerge?max_num_segments=1

then compare sizes.

1 Like

Yes, I did forcemerge to get a fair comparison.

If the mappings apart from this field is identical, have you also verified that the index settings are comparable? One thing that does not show up in the mappings but can have a significant impact on index size is the best_compression codec.

1 Like

The index settings were identical - both used best_compression and had the same number of shards. I am really puzzled.

Not sure what is going on. Can you please provide the full index stats and mappings?

I reran the same experiment and arrived at the opposite conclusion. I am pretty sure I had read the stats correctly though. Probably did something wrong somewhere, sorry and thanks for your answers

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.