Removing _source increases size of index

I've been trying to find ways to optimize my index size in Elasticsearch. In order to understand the size impact of the _source field, I dumped an index and re-indexed into another index with _source {enabled:false}. The original index was 466.9mb. I was somewhat shocked when the index with _source disabled actually resulted in an index size of 467.1mb... a slight increase in size. How is this possible?

I verified that the index/_mappings show _source disabled, my search results don't include _source and when I try to ask for _source=somefield, I get an error message. So, I'm fairly certain that the _source is actually removed in my test index.

I'm not sure that I actually want to remove _source in my production environment, but I'd really like to understand the storage optimizations that make it possible to store _source without adding additional size to the index.

Welcome to our community! :smiley:

I wouldn't remove source, it's something that I think it going to be deprecated in future as it severely limits other functionality, so you're likely on the backfoot if you do this. (See related warning here - _source field | Elasticsearch Guide [7.14] | Elastic

It'd help if you shared the mappings, and sample docs, so we can take a closer look.

Unless you forcemerged down to a single segment the two indices can be in different stages of merging which can affect size significantly.

I would also recommend verifying that mappings have not changed.

Thanks, I wasn't planning on removing _source, but wanted to understand the impact that it had on index size. I found my answer in this issue: https://github.com/elastic/elasticsearch/issues/41628#issuecomment-488155381. If source is disabled and soft deletes are enabled, then elastic will automatically stuff the source into a stored field called _recovery_source. This makes it appear that disabling source does not reduce the size of the index since soft deletes are enabled by default.

1 Like