_source.excludes/includes makes query 2 times slower

backend · March 5, 2020, 8:15am

ES: 7.4.1
I have a tiny little index (~100MB). The mapping is simple and naive - with no nested objects whatsoever. There's one field (an array of objects) which is kind of big, and sometimes I'd like to exclude that from the resulting response.
The thing is, if either "excludes" or "includes" is applied - the overall execution time (took) becomes twice higher.

Here's an easier way to reproduce the issue:
'_source' => ['includes' => ['id']] => 30ms
Without any includes/excludes => 15ms

What's the reason of it? Take into account that the whole index is loaded into RAM ( 'index.store.preload' => ['nvd', 'dvd', 'tim', 'doc', 'dim']), for the sake of better performance.
What's the mechanism behind that? I understand that in case of no includes/excludes ES can just map the entire index memory to the response without any additional processing. But why does it take that long to exclude/include unnecessary fields since all the data is in RAM already?
It seems to be a very quick O(n) operation. Thanks for your work!

DavidTurner · March 5, 2020, 11:35am

If you don't ask for any source manipulation then Elasticsearch treats the document as an opaque sequence of bytes, which it can handle very efficiently. If it has to manipulate the source at all then it must parse it, convert it into a tree of freshly-created objects, exclude the bits you want excluding, and then convert it back into a sequence of bytes for further processing. This extra work can be quite significant.

Note that the source is a stored field, but you are not preloading the stored fields file. Also note that preloading is only a best-effort process and does not guarantee that this data remains in RAM.

An alternative would be to store the field(s) that you do want for these queries rather than parsing them from the source each time.

Another alternative is to exclude this field at index time. This has downsides, of course (no reindexing, no updates, etc.) but maybe that's ok for you.

backend · March 5, 2020, 11:46am

Thanks for the answer! I'll consider using 'index.store.preload' => ['*'] then.

An alternative would be to [store] the field(s)
Another alternative is to [exclude this field at index time]

I believe that neither of those solutions would let me to include (sometimes I need that) the field into the resulting response. I am not querying this field, I am only interested to see that in response (sometimes).

It seems like the only way to achieve that is to have the an extra index (with those fields I need sometimes)?

system · April 2, 2020, 11:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch fast query but slow response time when retrieving _source even if nested fields are in _source_exclude Elasticsearch	2	1713	July 20, 2018
Possible optimisations for large _source documents Elasticsearch	7	595	July 5, 2017
Very bad performance with large text field Elasticsearch	11	6032	July 27, 2017
Performance issues around _source and large page size Elasticsearch	5	1001	July 5, 2017
Per document _source exclusion after indexing but before storage Elasticsearch	5	580	July 6, 2017

_source.excludes/includes makes query 2 times slower

Related topics